Version Explain:

In this version these will be performed:

  1. Top features selection based on trained models’ feature importance.

    This will depend on different number of CpGs selected and different features selection methods.

    The features selection methods mainly have two different purpose, one is for binary classification, another is multi-class classification.

  2. Top features selection based on trained models’ feature importance with different selection methods.

    There will have several selection methods, for example based on mean feature importance, median quantile feature importance and frequency / common feature importance.

    • The frequency / common feature importance is processed in the following:
      1. select the TOP Number of features (say 40) for each model
      2. calculated the frequency of the appearance of each features based on the Top Number of features selected from step1.
      3. For each features that appear greater than half time, we consider it’s important and collect these important features as common features.
  3. Output two data frames that will be used in Pareto optimal.

    One is filtered data frame with Top Number of features based on different method selection.

    The another one is the phenotype data frame.

  4. The section of evaluation for the output selected feature performance based on three methods are performed.

Input Session

This part is collection of input , change them as needed.

File Path :

csv_Ni1905FilePath<-"C:\\Users\\wangtia\\Desktop\\AD Risk\\DataSets\\ADNI_covariate_withEpiage_1905obs.csv"

TopSelectedCpGs_filePath<-"C:\\Users\\wangtia\\Desktop\\AD Risk\\DataSets\\Top5K_CpGs.csv"

Number of Top CpGs keeped:

# Number of Top CpGs keeped based on standard deviation
Number_N_TopNCpGs<-params$INPUT_Number_N_TopNCpGs

Session Input:

Session 1.6.1 Missing Value

# GO INPUT Session find "Impute_NA_FLAG_NUM":
# if we want to impute the NA with Mean , then let "Impute_NA_FLAG_NUM=1"
# if we want to impute the NA with KNN method , then let "Impute_NA_FLAG_NUM=2"

Impute_NA_FLAG_NUM = 1

Session 1.6.2 Feature Selection

# GO INPUT Session find "METHOD_FEATURE_FLAG_NUM":
# if we want to use 3 class classification , then let "METHOD_FEATURE_FLAG_NUM=1"
# if we want to use PCA method , then let "METHOD_FEATURE_FLAG_NUM=2"
# if we want to use 2 class classification , then let "METHOD_FEATURE_FLAG_NUM=3"
# if we want to use classification with CN vs AD, then let "METHOD_FEATURE_FLAG_NUM=4"
# if we want to use classification with CN vs MCI, then let "METHOD_FEATURE_FLAG_NUM=5"
# if we want to use classification with MCI vs AD, then let "METHOD_FEATURE_FLAG_NUM=6"

METHOD_FEATURE_FLAG_NUM = 1

Session 7.0 Important Features

# GOTO "INPUT" Session to set the Number of common features needed
# Generally this is for visualization

NUM_COMMON_FEATURES_SET = 20
NUM_COMMON_FEATURES_SET_Frequency = 20

Session 8.0 Feature Selection and Output

The feature selection method :

  1. based on mean feature importance ( set “INPUT_Method_Mean_Choose = TRUE” )
  2. based on median quantile feature importance ( set “INPUT_Method_Median_Choose = TRUE” )
  3. based on feature frequency importance. ( set “INPUT_Method_Frequency_Choose = TRUE” )
    • Comment: If use the feature frequency importance method, The Input number of features = N is used for the first step, select TOP N features for each model. In the end, may not exactly same as N features kept.
  4. Set Input method flag to FALSE will not generate the data based that method. If we want output all data based on each method, set all flag to TRUE. In summary, set the corresponding flag to TRUE, we will output the data set selected based on that corresponding method.
# This is the flag of phenotype data output, 
# if set to TRUE then output the file, will check if there exist the file in the given path, if not then write the file, if there exist the file then not return.
# if set to FLASE then not output the phenotype file.
# NOTICE THAT : the phenotype file is selected from "Merged_df".

phenoOutPUt_FLAG = TRUE
  

  
# For 8.0 Feature Selection and Output : 
# NUM_FEATURES <- INPUT_NUMBER_FEATURES
#   This is number of features needed
# Method_Selected_Choose <- INPUT_Method_Selected_Choose
#   This is the method performed for the Output stage feature selection method


INPUT_NUMBER_FEATURES = params$INPUT_OUT_NUMBER_FEATURES
INPUT_Method_Mean_Choose = TRUE
INPUT_Method_Median_Choose = TRUE
INPUT_Method_Frequency_Choose = TRUE



if(INPUT_Method_Mean_Choose|| INPUT_Method_Median_Choose || INPUT_Method_Frequency_Choose){
  OUTUT_file_directory<- "C:\\Users\\wangtia\\Desktop\\AD Risk\\part2\\VersionHistory\\Version7_AutoKnit_Results\\Method1_MultiClass\\Method1_MultiClass_SelectedFeatures\\"
  
  OUTUT_CSV_PATHNAME <- paste(OUTUT_file_directory,"INPUT_",Number_N_TopNCpGs,"CpGs\\",sep="")
  
  if (dir.exists(OUTUT_CSV_PATHNAME)) {
    message("Directory already exists.")
    } else {
    dir.create(OUTUT_CSV_PATHNAME, recursive = TRUE)
    message("Directory created.")
    }
  
}
## Directory already exists.

Session 10.0 Perfomance Metrics

FLAG_WRITE_METRICS_DF is flag of whether to output the csv which contains the performance metrics.

# This is the flag of output the metrics of this file, include model training stage metrics , key features selected based on mean Performance metrics, key feature selected based on median Performance metrics, key feature selected based on frequency Performance metrics

Metrics_Table_Output_FLAG = TRUE


FLAG_WRITE_METRICS_DF = TRUE

if(FLAG_WRITE_METRICS_DF){
  OUTUT_PerfMertics_directory<-"C:\\Users\\wangtia\\Desktop\\AD Risk\\part2\\VersionHistory\\Version7_AutoKnit_Results\\Method1_MultiClass\\Method1_MultiClass_PerformanceMetrics\\"
  
  OUTUT_PerformanceMetricsCSV_PATHNAME <- paste(OUTUT_PerfMertics_directory,"INPUT_",Number_N_TopNCpGs,"CpGs_",INPUT_NUMBER_FEATURES,"SelFeature_PerMetrics.csv",sep="")
  
  if (dir.exists(OUTUT_PerfMertics_directory)) {
    message("Directory already exists.")
    } else {
    dir.create(OUTUT_PerfMertics_directory, recursive = TRUE)
    message("Directory created.")
    }
  print(OUTUT_PerformanceMetricsCSV_PATHNAME)
  
}
## Directory already exists.
## [1] "C:\\Users\\wangtia\\Desktop\\AD Risk\\part2\\VersionHistory\\Version7_AutoKnit_Results\\Method1_MultiClass\\Method1_MultiClass_PerformanceMetrics\\INPUT_5000CpGs_250SelFeature_PerMetrics.csv"

1. Preprocess

Packages and Libraries that may need to install and use.

# Function to check and install Bioconductor package: "limma"

install_bioc_packages <- function(packages) {
  if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
  }
  for (pkg in packages) {
    if (!requireNamespace(pkg, quietly = TRUE)) {
      BiocManager::install(pkg, dependencies = TRUE)
    } else {
      message(paste("Package", pkg, "is already installed."))
    }
  }
}


install_bioc_packages("limma")
## Package limma is already installed.
print("The required packages are all successfully installed.")
## [1] "The required packages are all successfully installed."
library(limma)

Set seed for reproduction.

set.seed(123)

1.1 Data Read and Preview

csv_NI1905<-read.csv(csv_Ni1905FilePath)
csv_NI1905_RAW <- csv_NI1905
TopSelectedCpGs<-read.csv(TopSelectedCpGs_filePath, check.names = FALSE)
TopSelectedCpGs_RAW <- TopSelectedCpGs

1.1.1 csv_NI1905 (“ADNI_covariate_withEpiage_1905obs.csv”)

head(csv_NI1905,n=3)
rownames(csv_NI1905)<-as.matrix(csv_NI1905[,"barcodes"])
dim(csv_NI1905)
## [1] 1905   23

1.1.2 TopSelectedCpGs

dim(TopSelectedCpGs)
## [1] 5000 1921
head(TopSelectedCpGs[,1:8])
rownames(TopSelectedCpGs)<-TopSelectedCpGs[,1]
head(rownames(TopSelectedCpGs))
## [1] "cg08223187" "cg15794987" "cg04821830" "cg24629711" "cg17380855" "cg10360725"
head(colnames(TopSelectedCpGs))
## [1] "ProbeID"             "200223270003_R01C01" "200223270003_R02C01" "200223270003_R03C01"
## [5] "200223270003_R04C01" "200223270003_R05C01"
tail(colnames(TopSelectedCpGs))
## [1] "201046290111_R04C01" "201046290111_R05C01" "201046290111_R06C01" "201046290111_R07C01"
## [5] "201046290111_R08C01" "sdDev"

1.1.3 “TopN_CpGs”

1.1.3.1 Select Top N CpGs

This part is used to adjust the CpGs needed to use, it will keep the top N CpGs based on standard deviation.

sorted_TopSelectedCpGs <- TopSelectedCpGs[order(-TopSelectedCpGs$sdDev), ]
TopN_CpGs <- head(sorted_TopSelectedCpGs,Number_N_TopNCpGs )
TopN_CpGs_RAW<-TopN_CpGs

Variable “TopN_CpGs” will be used for processing the data. Now let’s take a look at it.

1.1.3.2 Preview “TopN_CpGs”

dim(TopN_CpGs)
## [1] 5000 1921
rownames(TopN_CpGs)<-TopN_CpGs[,1]
head(rownames(TopN_CpGs))
## [1] "cg08223187" "cg15794987" "cg04821830" "cg24629711" "cg17380855" "cg10360725"
head(colnames(TopN_CpGs))
## [1] "ProbeID"             "200223270003_R01C01" "200223270003_R02C01" "200223270003_R03C01"
## [5] "200223270003_R04C01" "200223270003_R05C01"
tail(colnames(TopN_CpGs))
## [1] "201046290111_R04C01" "201046290111_R05C01" "201046290111_R06C01" "201046290111_R07C01"
## [5] "201046290111_R08C01" "sdDev"

1.2 Check Duplicates

Now, let’s check with duplicate of Sample ID (“barcodes”):

Start with people who don’t have the unique ID (“uniqueID = 0”):

library(dplyr)
dim(csv_NI1905[csv_NI1905$uniqueID == 0, ])
## [1] 1256   23
dim(csv_NI1905[csv_NI1905$uniqueID == 1, ])
## [1] 649  23
duplicates <-  csv_NI1905[csv_NI1905$uniqueID == 0, ] %>%
  group_by(barcodes) %>%
  filter(n() > 1) %>%
  ungroup()

print(dim(duplicates))
## [1]  0 23
rm(duplicates)

Based on the output of dimension , they have the different Sample ID (“barcodes”).

Then check with all records, whether they have duplicated Sample ID (“barcodes”).

duplicates <-  csv_NI1905 %>%
  group_by(barcodes) %>%
  filter(n() > 1) %>%
  ungroup()
print(dim(duplicates))
## [1]  0 23

From the above output, we can see the Sample ID (“barcodes”) are unique.

names(csv_NI1905)
##  [1] "barcodes"    "RID.a"       "prop.B"      "prop.NK"     "prop.CD4T"   "prop.CD8T"  
##  [7] "prop.Mono"   "prop.Neutro" "prop.Eosino" "DX"          "age.now"     "PTGENDER"   
## [13] "ABETA"       "TAU"         "PTAU"        "PC1"         "PC2"         "PC3"        
## [19] "ageGroup"    "ageGroupsq"  "DX_num"      "uniqueID"    "Horvath"

There might have the situation that the same person with different timeline. So we only keep the data with who has the unique ID, “unique ID =1”

csv_NI1905<-csv_NI1905[csv_NI1905$uniqueID == 1, ]
dim(csv_NI1905)
## [1] 649  23

1.3 Remove NA values

Since “DX” will be response variable, we first remove all rows with NA value in “DX” column

# "DX" will be Y,remove all rows with NA value in "DX" column
csv_NI1905<-csv_NI1905 %>% filter(!is.na(DX)) 

1.4 Sample Name filtering

We only keep with the samples which appears in both datasets.

Matrix_sample_names_NI1905 <- as.matrix(csv_NI1905[,"barcodes"])
Matrix_sample_names_TopN_CpGs <- as.matrix(colnames(TopN_CpGs))
common_sample_names<-intersect(Matrix_sample_names_NI1905,Matrix_sample_names_TopN_CpGs)
csv_NI1905 <- csv_NI1905 %>% filter(barcodes %in% common_sample_names)
TopN_CpGs <- TopN_CpGs[, common_sample_names, drop = FALSE]
head(TopN_CpGs[,1:3],n=2)
dim(TopN_CpGs)
## [1] 5000  648
dim(csv_NI1905)
## [1] 648  23

1.5 Merged DataFrame

1.5.1 Merge two datasets

Merge these two datasets and tored into “merged_df”

trans_TopN_CpGs<-t(TopN_CpGs)

# Check the total length of the rownames
# Recall that the sample name have been matched and both of them don't have duplicates
# Now, order the rownames and bind them together. This can make sure that the merged data frame created by these two data frame correctly matched together.

trans_TopN_CpGs_ordered<-trans_TopN_CpGs[order(rownames(trans_TopN_CpGs)),]
csv_NI1905_ordered<-csv_NI1905[order(rownames(csv_NI1905)),]
print("The rownames matchs in order:")
## [1] "The rownames matchs in order:"
check_1 = length(rownames(csv_NI1905_ordered))
check_2 = sum(rownames(csv_NI1905_ordered)==rownames(trans_TopN_CpGs_ordered))
print(check_1==check_2)
## [1] TRUE
merged_df_raw<-cbind(trans_TopN_CpGs_ordered,csv_NI1905_ordered)
phenotic_features_RAW<-colnames(csv_NI1905)
print(phenotic_features_RAW)
##  [1] "barcodes"    "RID.a"       "prop.B"      "prop.NK"     "prop.CD4T"   "prop.CD8T"  
##  [7] "prop.Mono"   "prop.Neutro" "prop.Eosino" "DX"          "age.now"     "PTGENDER"   
## [13] "ABETA"       "TAU"         "PTAU"        "PC1"         "PC2"         "PC3"        
## [19] "ageGroup"    "ageGroupsq"  "DX_num"      "uniqueID"    "Horvath"
phenoticPart_RAW <- merged_df_raw[,phenotic_features_RAW]
dim(phenoticPart_RAW)
## [1] 648  23
head(phenoticPart_RAW)
head(merged_df_raw[,1:3])
merged_df<-merged_df_raw

1.5.2 “merged_df”

head(colnames(merged_df))
## [1] "cg08223187" "cg15794987" "cg04821830" "cg24629711" "cg17380855" "cg10360725"

1.5.3 Feature Names

(1) CpGs (beta values)

The name of feature CpGs could be called by: “featureName_CpGs”

featureName_CpGs<-rownames(TopN_CpGs)
length(featureName_CpGs)
## [1] 5000
head(featureName_CpGs)
## [1] "cg08223187" "cg15794987" "cg04821830" "cg24629711" "cg17380855" "cg10360725"

1.6 Clean Merged datasets

clean_merged_df<-merged_df

1.6.1 Missing Value

missing_val_cols <- colnames(clean_merged_df)[colSums(is.na(clean_merged_df)) > 0]
colSums(is.na(clean_merged_df))[missing_val_cols]
## ABETA   TAU  PTAU 
##   109   109   109

Choose Output Data

Choose the method we want the data apply. The output dataset name is “clean_merged_df”.

# GO INPUT Session find "Impute_NA_FLAG_NUM":
# if we want to impute the NA with Mean , then let "Impute_NA_FLAG_NUM=1"
# if we want to impute the NA with KNN method , then let "Impute_NA_FLAG_NUM=2"

Impute_NA_FLAG = Impute_NA_FLAG_NUM

(1) Impute with Mean

if (Impute_NA_FLAG == 1){
  clean_merged_df_imputed_mean<-clean_merged_df

  mean_ABETA_rmNA <- mean(clean_merged_df$ABETA, na.rm = TRUE)
  clean_merged_df_imputed_mean$ABETA[
    is.na(clean_merged_df_imputed_mean$ABETA)] <- mean_ABETA_rmNA

  mean_TAU_rmNA <- mean(clean_merged_df$TAU, na.rm = TRUE)
  clean_merged_df_imputed_mean$TAU[
    is.na(clean_merged_df_imputed_mean$TAU)] <- mean_TAU_rmNA

  mean_PTAU_rmNA <- mean(clean_merged_df$PTAU, na.rm = TRUE)
  clean_merged_df_imputed_mean$PTAU[
    is.na(clean_merged_df_imputed_mean$PTAU)] <- mean_PTAU_rmNA
  
  clean_merged_df = clean_merged_df_imputed_mean 
}

(2) Impute with KNN

library(VIM)
if (Impute_NA_FLAG == 2){
  df_imputed_KNN <- kNN(merged_df, k = 5)
  imputed_summary <- colSums(df_imputed_KNN[, grep("_imp", names(df_imputed_KNN))])
  print(imputed_summary[imputed_summary > 0])
  clean_merged_df<-df_imputed_KNN[, -grep("_imp", names(df_imputed_KNN))]
}

Check the missing value problem solved

missing_val_cols <- colnames(clean_merged_df)[colSums(is.na(clean_merged_df)) > 0]
colSums(is.na(clean_merged_df))[missing_val_cols]
## named numeric(0)

1.6.2 Feature Selection

Choose Method Use

Choose the method we want to use

# GO INPUT Session find "METHOD_FEATURE_FLAG_NUM":
# if we want to use 3 class classification , then let "METHOD_FEATURE_FLAG_NUM=1"
# if we want to use PCA method , then let "METHOD_FEATURE_FLAG_NUM=2"
# if we want to use 2 class classification , then let "METHOD_FEATURE_FLAG_NUM=3"

METHOD_FEATURE_FLAG = METHOD_FEATURE_FLAG_NUM

(1) Method One

if (METHOD_FEATURE_FLAG ==  1){
  df_fs_method1 <- clean_merged_df
}
Picking Features
if(METHOD_FEATURE_FLAG ==  1){
  
  phenotic_features_m1<-c("DX","age.now","PTGENDER",
                          "PC1","PC2","PC3")
  pickedFeatureName_m1<-c(phenotic_features_m1,featureName_CpGs)
  df_fs_method1<-clean_merged_df[,pickedFeatureName_m1]
  df_fs_method1$DX<-as.factor(df_fs_method1$DX)
  df_fs_method1$PTGENDER<-as.factor(df_fs_method1$PTGENDER)
  head(df_fs_method1[,1:5],n=3)
  dim(df_fs_method1)
}
## [1]  648 5006
if(METHOD_FEATURE_FLAG ==  1){
  dim(df_fs_method1)
}
## [1]  648 5006
Perform DMP - Use LIMMA

Create contrast matrix for comparing CN vs Dementia vs MCI

if(METHOD_FEATURE_FLAG == 1){

  pheno_data_m1 <- df_fs_method1[,phenotic_features_m1] 
  head(pheno_data_m1[,1:5],n=3)
  
  pheno_data_m1$DX <- factor(pheno_data_m1$DX, levels = c("CN", "MCI", "Dementia"))
  design_m1 <- model.matrix(~ 0 + DX + age.now + PTGENDER + PC1 + PC2 + PC3,
                         data = pheno_data_m1)

  colnames(design_m1)[colnames(design_m1) == "DXCN"] <- "CN"
  colnames(design_m1)[colnames(design_m1) == "DXDementia"] <- "Dementia"
  colnames(design_m1)[colnames(design_m1) == "DXMCI"] <- "MCI"

  head(design_m1)
  
  cpg_matrix_m1 <- t(as.matrix(df_fs_method1[, featureName_CpGs]))
  fit_m1 <- lmFit(cpg_matrix_m1, design_m1)


}
if(METHOD_FEATURE_FLAG == 1){
  # for here, we have three labels. The contrasts to compare groups will be: 
  contrast_matrix_m1 <- makeContrasts(
  MCI_vs_CN = MCI - CN,
  Dementia_vs_CN = Dementia - CN,
  Dementia_vs_MCI = Dementia - MCI,
  levels = design_m1
  )
  fit2_m1 <- contrasts.fit(fit_m1, contrast_matrix_m1)
  fit2_m1 <- eBayes(fit2_m1)
  
  topTable(fit2_m1, coef = "MCI_vs_CN") 
  topTable(fit2_m1, coef = "Dementia_vs_CN")  
  topTable(fit2_m1, coef = "Dementia_vs_MCI") 
  summary_results_m1 <- decideTests(fit2_m1,method = "nestedF", adjust.method = "none", p.value = 0.05)
  table(summary_results_m1)

  
}
## summary_results_m1
##    -1     0     1 
##   134 14732   134
if(METHOD_FEATURE_FLAG == 1){

  significant_dmp_filter_m1 <- summary_results_m1 != 0 
  significant_cpgs_m1_DMP <- unique(rownames(summary_results_m1)[
    apply(significant_dmp_filter_m1, 1, any)])
  print(paste("The significant CpGs after DMP are:",
             paste(significant_cpgs_m1_DMP, collapse = ", ")))
  print(paste("Length of CpGs after DMP:", 
              length(significant_cpgs_m1_DMP)))
  
  pickedFeatureName_m1_afterDMP<-c(phenotic_features_m1,significant_cpgs_m1_DMP)
  df_fs_method1<-df_fs_method1[,pickedFeatureName_m1_afterDMP]

  dim(df_fs_method1)
}
## [1] "The significant CpGs after DMP are: cg03278611, cg02621446, cg23916408, cg12146221, cg05234269, cg14293999, cg19377607, cg14307563, cg21209485, cg11331837, cg11187460, cg14564293, cg12012426, cg00999469, cg17421046, cg27639199, cg24851651, cg16788319, cg25879395, cg18339359, cg12284872, cg15014361, cg24506579, cg05321907, cg10985055, cg20139683, cg26212480, cg10750306, cg26777760, cg01667144, cg27341708, cg12466610, cg03327352, cg02320265, cg08779649, cg13885788, cg25561557, cg01413796, cg26069044, cg03088219, cg12682323, cg17738613, cg17186592, cg17906851, cg01933473, cg16771215, cg02902672, cg05476522, cg16211147, cg11438323, cg27086157, cg17479100, cg15535896, cg18821122, cg05841700, cg10738648, cg16579946, cg20370184, cg02122327, cg12784167, cg15633912, cg02494911, cg15907464, cg21854924, cg17970282, cg25436480, cg12534577, cg15865722, cg23762217, cg06864789, cg10306780, cg24859648, cg26822438, cg01733439, cg18403317, cg16178271, cg00675157, cg10369879, cg18136963, cg22274273, cg01128042, cg27558057, cg08198851, cg04412904, cg11227702, cg04841583, cg01150227, cg20913114, cg02932958, cg00962106, cg15775217, cg21697769, cg09227616, cg03651054, cg16715186, cg00696044, cg12738248, cg03900860, cg04302300, cg01013522, cg00616572, cg05096415, cg01153376, cg09854620, cg24861747, cg19512141, cg06378561, cg11673013, cg02356645, cg02372404, cg11978593, cg06950937, cg00272795, cg02834750, cg05305760, cg03071582, cg08584917, cg23161429, cg07138269, cg13080267, cg25758034, cg23658987, cg25259265, cg17224287, cg14924512, cg08697944, cg14710850, cg06118351, cg11664825, cg07480176, cg08857872, cg20678988, cg06875704, cg24873924, cg01921484, cg12776173, cg07466166, cg00247094, cg03084184, cg17329602, cg20116159, cg01549082, cg20549400, cg26948066, cg07523188, cg26474732, cg24263233, cg11133939, cg02225060, cg19741073, cg12279734, cg12377327, cg10240127, cg23432430, cg16652920, cg06112204, cg12228670, cg21905818, cg19503462, cg07028768, cg14240646, cg13663706, cg09584650, cg27272246, cg09418035, cg16749614, cg26506212, cg04664583, cg26757229, cg03982462, cg06715136, cg15501526, cg09092713, cg04248279, cg08434396, cg01680303, cg07158503, cg06536614, cg26219488, cg18819889, cg05570109, cg02981548, cg08861434, cg00689685, cg17429539, cg00322003, cg11247378, cg07152869, cg10796603, cg00154902, cg20201388, cg14527649, cg08800033, cg27452255, cg03129555, cg06697310, cg20507276, cg14961598, cg08108858, cg27577781, cg20685672, cg03660162"
## [1] "Length of CpGs after DMP: 202"
## [1] 648 208
Use “Recipe” - - Process Data
if(METHOD_FEATURE_FLAG == 1){
  
  library(recipes)
  df_picked <- df_fs_method1
 
  rec <- recipe(DX ~ ., data = df_picked) %>%
    step_zv(all_predictors()) %>%  
   # step_range(all_numeric(), -all_outcomes()) %>%
    step_dummy(all_nominal(), -all_outcomes())%>%
    step_corr(all_predictors(), threshold = 0.7)

  rec_prep <- prep(rec, df_picked)

  processed_data_m1 <- bake(rec_prep, new_data = df_picked)
  dim(processed_data_m1)
  processed_data_m1_df<-as.data.frame(processed_data_m1)
  rownames(processed_data_m1_df)<-rownames(df_picked)
}
if(METHOD_FEATURE_FLAG == 1){
  AfterProcess_FeatureName_m1<-colnames(processed_data_m1)
  head(AfterProcess_FeatureName_m1)
  tail(AfterProcess_FeatureName_m1)
}
## [1] "cg06697310" "cg20507276" "cg27577781" "cg20685672" "cg03660162" "DX"
if(METHOD_FEATURE_FLAG == 1){
  head(processed_data_m1[,1:5])
}
if(METHOD_FEATURE_FLAG == 1){
  lastColumn_NUM<-dim(processed_data_m1)[2]
  last5Column_NUM<-lastColumn_NUM-5
  head(processed_data_m1[,last5Column_NUM :lastColumn_NUM])
}

(2) Method Two - PCA

if(METHOD_FEATURE_FLAG == 2){
  bloodPropFeatureName<-c("RID.a","prop.B","prop.NK",
                          "prop.CD4T","prop.CD8T","prop.Mono",
                          "prop.Neutro","prop.Eosino")
  pickedFeatureName_m2<-c("DX","age.now",
                          "PTGENDER",bloodPropFeatureName,
                          "ABETA","TAU","PTAU",featureName_CpGs)
  df_fs_method2<-clean_merged_df[,pickedFeatureName_m2]
}
Use “Recipe” preprocess the Data
if(METHOD_FEATURE_FLAG == 2){
  library(recipes)

  rec <- recipe(DX ~ ., data = df_fs_method2) %>%
    step_zv(all_predictors()) %>%
    step_normalize(all_numeric(), -all_outcomes()) %>%
    step_dummy(all_nominal(), -all_outcomes())%>%
    step_corr(all_predictors(), threshold = 0.7)

  rec_prep <- prep(rec, df_fs_method2)

  processed_data_m2 <- bake(rec_prep, new_data = df_fs_method2)
  dim(processed_data_m2)
}
PCA
if(METHOD_FEATURE_FLAG == 2){
  
  X_df_m2<-subset(processed_data_m2,select = -DX)
  Y_df_m2<-processed_data_m2$DX

  pca_result <- prcomp(X_df_m2, center = TRUE, scale. = TRUE)

  summary(pca_result)

  screeplot(pca_result,type="lines")

}
if(METHOD_FEATURE_FLAG == 2){
  
  PCA_component_threshold<-0.7
}
if(METHOD_FEATURE_FLAG == 2){
  library(caret)
  preproc<-preProcess(X_df_m2,method="pca",
                      thresh = PCA_component_threshold)
  X_df_m2_transformed_PCA <- predict(preproc,X_df_m2)
  data_processed_PCA<-data.frame(X_df_m2_transformed_PCA,Y_df_m2)
  colnames(data_processed_PCA)[
    which(colnames(data_processed_PCA)=="Y_df_m2")]<-"DX"
  head(data_processed_PCA)
}
if(METHOD_FEATURE_FLAG == 2){
  processed_data_m2<-data_processed_PCA
  AfterProcess_FeatureName_m2<-colnames(data_processed_PCA)
}

(3) Method Three - Covert to Binary Class

if(METHOD_FEATURE_FLAG == 3){
  
  df_fs_method3<-clean_merged_df

}
Picking Features
if(METHOD_FEATURE_FLAG == 3){
  phenotic_features_m3<-c(
    "DX","age.now","PTGENDER","PC1","PC2","PC3")
  pickedFeatureName_m3<-c(phenotic_features_m3,featureName_CpGs)
  df_picked_m3<-df_fs_method3[,pickedFeatureName_m3]

  df_picked_m3$DX<-as.factor(df_picked_m3$DX)
  df_picked_m3$PTGENDER<-as.factor(df_picked_m3$PTGENDER)
  head(df_picked_m3[,1:5],n=3)
}
if(METHOD_FEATURE_FLAG == 3){
  dim(df_picked_m3)
}
Change to Two Class Classification
if(METHOD_FEATURE_FLAG == 3){
  df_picked_m3<-df_picked_m3 %>% mutate(
    DX = ifelse(DX == "CN", "CN",ifelse(DX 
    %in% c("MCI","Dementia"),"CI",NA)))
  
  df_picked_m3$DX<-as.factor(df_picked_m3$DX)
  df_picked_m3$PTGENDER<-as.factor(df_picked_m3$PTGENDER)

  head(df_picked_m3[1:10],n=3)

}
Perform DMP - Use LIMMA
if(METHOD_FEATURE_FLAG == 3){
  pheno_data_m3 <- df_picked_m3[,phenotic_features_m3] 
  head(pheno_data_m3[,1:5],n=3)

  design_m3 <- model.matrix(~0 + .,data=pheno_data_m3)

  colnames(design_m3)[colnames(design_m3) == "DXCN"] <- "CN"
  colnames(design_m3)[colnames(design_m3) == "DXCI"] <- "CI"

  head(design_m3)

  beta_values_m3 <- t(as.matrix(df_fs_method3[,featureName_CpGs]))

}

In order to perform the differential analysis - Differentially Methylated Position (DMP), we have to define the contrast that we are interested in. In this method 3, we focus on two groups, one contrast of interest.

if(METHOD_FEATURE_FLAG == 3){

  fit_m3 <- lmFit(beta_values_m3, design_m3)
  head(fit_m3$coefficients)


  contrast.matrix <- makeContrasts(CI - CN, levels = design_m3)
 
  fit2_m3 <- contrasts.fit(fit_m3, contrast.matrix)

  # Apply the empirical Bayes’ step to get our differential expression statistics and p-values.

  fit2_m3 <- eBayes(fit2_m3)
}
if(METHOD_FEATURE_FLAG == 3){
  decideTests(fit2_m3)
}
if(METHOD_FEATURE_FLAG == 3){
  dmp_results_m3_try1 <- decideTests(
    fit2_m3, lfc = 0.01, adjust.method = "fdr", p.value = 0.1)
  table(dmp_results_m3_try1)

}
if(METHOD_FEATURE_FLAG == 3){
  # Identify DMPs, we will use this one:
  dmp_results_m3 <- decideTests(
    fit2_m3, lfc = 0.01, adjust.method = "none", p.value = 0.1)

  table(dmp_results_m3)
}
Final used CpGs after DMP
if(METHOD_FEATURE_FLAG == 3){

  significant_dmp_filter <- dmp_results_m3 != 0 
  significant_cpgs_m3_DMP <- rownames(dmp_results_m3)[
    apply(significant_dmp_filter, 1, any)]

  pickedFeatureName_m3_afterDMP<-c(phenotic_features_m3,significant_cpgs_m3_DMP)
  df_picked_m3<-df_picked_m3[,pickedFeatureName_m3_afterDMP]

  dim(df_picked_m3)
}
Visualize with the results of DMP

The “Volcano Plot”is one way to visualize the results of a DE analysis.

X - axis shows the log-fold change in methylation levels between two classes. The Log Fold Change (LogFC) can be calculated as \(\log_2 \left( \frac{\text{mean}(\text{Group1})}{\text{mean}(\text{Group2})} \right)\).

Interpretation of logFC:

  • Positive LogFC: Indicates that the measurement is higher in the first group compared to the second group, here means hypermethylation (increase in methylation).

  • Negative LogFC: Indicates that the measurement is lower in the first group compared to the second group, here means hypomethylation (decrease in methylation) in the experimental condition compared to the reference.

  • LogFC of 0: Indicates no difference in the measurement between the two groups.

Y - axis shows some measure of statistical significance, like the log-odds, or “B” statistic. In the following, we will use B statistics. The log-odds can be calculated by \(B = \log_e(\text{posterior odds})\).

Interpretation of B-value:

  • Higher B-value: Indicates stronger evidence for differential methylation.

  • Lower (or negative) B-value: Indicates weaker evidence for differential methylation.

  • B-value close to zero: Indicates uncertainty or lack of strong evidence for differential methylation.

A characteristic “volcano” shape should be seen. Let’s look at the results:

if(METHOD_FEATURE_FLAG == 3){
  full_results_m3 <- topTable(fit2_m3, number=Inf)
  full_results_m3 <- tibble::rownames_to_column(full_results_m3,"ID")
  head(full_results_m3)
}
if(METHOD_FEATURE_FLAG == 3){
  sorted_full_results_m3 <- full_results_m3[
    order(full_results_m3$logFC, decreasing = TRUE), ]
  head(sorted_full_results_m3)
}
if(METHOD_FEATURE_FLAG == 3){
  library(ggplot2)
  ggplot(full_results_m3,aes(x = logFC, y=B)) + geom_point()
}

Now, let’s visualize the plot with the cutoff

if(METHOD_FEATURE_FLAG == 3){
  library(dplyr)
  library(ggrepel)
  p_cutoff <- 0.1
  fc_cutoff <- 0.01
  topN <- 20

  full_results_m3 <- full_results_m3 %>%
      mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
      mutate(Rank = rank(-abs(logFC)), 
             Label = ifelse(Rank <= topN, as.character(ID), ""))

  ggplot(full_results_m3, aes(x = logFC, 
                              y = B, col = Significant, label = Label)) +
    geom_point() +
    geom_text_repel(col = "black")
}

Now, let’s change the y-axis to P value

if(METHOD_FEATURE_FLAG == 3){
  ggplot(full_results_m3,aes(x = logFC, y=-log10(P.Value))) + geom_point()
}
if(METHOD_FEATURE_FLAG == 3){
  library(dplyr)
  library(ggrepel)
  p_cutoff <- 0.1
  fc_cutoff <- 0.01
  topN <- 20

  full_results_m3 <- full_results_m3 %>%
      mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
      mutate(Rank = rank(-abs(logFC)), 
             Label = ifelse(Rank <= topN, as.character(ID), ""))

  ggplot(full_results_m3, 
         aes(x = logFC, y = -log10(P.Value), 
             col = Significant, 
             label = Label)) +
    geom_point() +
    geom_text_repel(col = "black")
}
Use “Recipe” - - Process Data
if(METHOD_FEATURE_FLAG == 3){
  
  library(recipes)

 
  rec <- recipe(DX ~ ., data = df_picked_m3) %>%
    step_zv(all_predictors()) %>%
    # step_range(all_numeric(), -all_outcomes()) %>%
    step_dummy(all_nominal(), -all_outcomes())%>%
    step_corr(all_predictors(), threshold = 0.7)

  rec_prep <- prep(rec, df_picked_m3)

  processed_data_m3 <- bake(rec_prep, new_data = df_picked_m3)
  processed_data_m3_df <- as.data.frame(processed_data_m3)
  rownames(processed_data_m3_df) <- rownames(df_picked_m3)
  dim(processed_data_m3)
}
if(METHOD_FEATURE_FLAG == 3){
  AfterProcess_FeatureName_m3<-colnames(processed_data_m3)
  head(AfterProcess_FeatureName_m3)
  tail(AfterProcess_FeatureName_m3)
}
if(METHOD_FEATURE_FLAG == 3){
  levels(df_picked_m3$DX)
}
if(METHOD_FEATURE_FLAG == 3){
  head(processed_data_m3[,1:5],n=3)
}
if(METHOD_FEATURE_FLAG == 3){
  lastColumn_NUM_m3<-dim(processed_data_m3)[2]
  last5Column_NUM_m3<-lastColumn_NUM_m3-5
  head(processed_data_m3[,last5Column_NUM_m3 :lastColumn_NUM_m3])
}
if(METHOD_FEATURE_FLAG == 3){
  levels(processed_data_m3$DX)
}

(4) Method Four - CN vs AD

In this method, only CN and AD class will be considered.

if(METHOD_FEATURE_FLAG == 4){
  
  df_fs_method4<-clean_merged_df

}
Picking Features
if(METHOD_FEATURE_FLAG == 4){
  phenotic_features_m4<-c(
    "DX","age.now","PTGENDER","PC1","PC2","PC3")
  pickedFeatureName_m4<-c(phenotic_features_m4,featureName_CpGs)
  df_picked_m4<-df_fs_method4[,pickedFeatureName_m4]

  df_picked_m4$DX<-as.factor(df_picked_m4$DX)
  df_picked_m4$PTGENDER<-as.factor(df_picked_m4$PTGENDER)
  head(df_picked_m4[,1:5],n=3)
}
if(METHOD_FEATURE_FLAG == 4){
  dim(df_picked_m4)
}
Filter and Change to Classification with ‘CN vs AD (Dementia)’
if(METHOD_FEATURE_FLAG == 4){
  df_picked_m4<-df_picked_m4 %>%  filter(DX != "MCI") %>% droplevels()

  
  df_picked_m4$DX<-as.factor(df_picked_m4$DX)
  df_picked_m4$PTGENDER<-as.factor(df_picked_m4$PTGENDER)

  head(df_picked_m4[1:10],n=3)

}
if(METHOD_FEATURE_FLAG == 4){
  print(dim(df_picked_m4))
  print(table(df_picked_m4$DX))
}
if(METHOD_FEATURE_FLAG == 4){
  df_fs_method4 <- df_fs_method4 %>%  filter(DX != "MCI") %>% droplevels()
  df_fs_method4$DX<-as.factor(df_fs_method4$DX)
  print(head(df_fs_method4))
  print(dim(df_fs_method4))
}
Perform DMP - Use LIMMA
if(METHOD_FEATURE_FLAG == 4){
  pheno_data_m4 <- df_picked_m4[,phenotic_features_m4] 
  print(head(pheno_data_m4[,1:5],n=3))

  design_m4 <- model.matrix(~0 + .,data=pheno_data_m4)

  colnames(design_m4)[colnames(design_m4) == "DXCN"] <- "CN"
  colnames(design_m4)[colnames(design_m4) == "DXDementia"] <- "Dementia"

  print(head(design_m4))

  beta_values_m4 <- t(as.matrix(df_fs_method4[,featureName_CpGs]))

}

In order to perform the differential analysis - Differentially Methylated Position (DMP), we have to define the contrast that we are interested in. In this method 4, we focus on two groups (CN and Demantia), one contrast of interest.

if(METHOD_FEATURE_FLAG == 4){

  fit_m4 <- lmFit(beta_values_m4, design_m4)
  head(fit_m4$coefficients)


  contrast.matrix <- makeContrasts(Dementia - CN, levels = design_m4)
 
  fit2_m4 <- contrasts.fit(fit_m4, contrast.matrix)

  # Apply the empirical Bayes’ step to get our differential expression statistics and p-values.

  fit2_m4 <- eBayes(fit2_m4)
}
if(METHOD_FEATURE_FLAG == 4){
  decideTests(fit2_m4)
}
if(METHOD_FEATURE_FLAG == 4){
  dmp_results_m4_try1 <- decideTests(
    fit2_m4, lfc = 0.01, adjust.method = "fdr", p.value = 0.1)
  table(dmp_results_m4_try1)

}

The constraints is too tight, let’s smooth the constraint.

if(METHOD_FEATURE_FLAG == 4){
  # Identify DMPs, we will use this one:
  dmp_results_m4 <- decideTests(
    fit2_m4, lfc = 0.01, adjust.method = "none", p.value = 0.1)

  table(dmp_results_m4)
}
Final used CpGs after DMP
if(METHOD_FEATURE_FLAG == 4){

  significant_dmp_filter <- dmp_results_m4 != 0 
  significant_cpgs_m4_DMP <- rownames(dmp_results_m4)[
    apply(significant_dmp_filter, 1, any)]

  pickedFeatureName_m4_afterDMP<-c(phenotic_features_m4,significant_cpgs_m4_DMP)
  df_picked_m4<-df_picked_m4[,pickedFeatureName_m4_afterDMP]

  dim(df_picked_m4)
}
Visualize with the results of DMP

The “Volcano Plot”is one way to visualize the results of a DE analysis.

X - axis shows the log-fold change in methylation levels between two classes. The Log Fold Change (LogFC) can be calculated as \(\log_2 \left( \frac{\text{mean}(\text{Group1})}{\text{mean}(\text{Group2})} \right)\).

Interpretation of logFC:

  • Positive LogFC: Indicates that the measurement is higher in the first group compared to the second group, here means hypermethylation (increase in methylation).

  • Negative LogFC: Indicates that the measurement is lower in the first group compared to the second group, here means hypomethylation (decrease in methylation) in the experimental condition compared to the reference.

  • LogFC of 0: Indicates no difference in the measurement between the two groups.

Y - axis shows some measure of statistical significance, like the log-odds, or “B” statistic. In the following, we will use B statistics. The log-odds can be calculated by \(B = \log_e(\text{posterior odds})\).

Interpretation of B-value:

  • Higher B-value: Indicates stronger evidence for differential methylation.

  • Lower (or negative) B-value: Indicates weaker evidence for differential methylation.

  • B-value close to zero: Indicates uncertainty or lack of strong evidence for differential methylation.

A characteristic “volcano” shape should be seen. Let’s look at the results:

if(METHOD_FEATURE_FLAG == 4){
  full_results_m4 <- topTable(fit2_m4, number=Inf)
  full_results_m4 <- tibble::rownames_to_column(full_results_m4,"ID")
  head(full_results_m4)
}
if(METHOD_FEATURE_FLAG == 4){
  sorted_full_results_m4 <- full_results_m4[
    order(full_results_m4$logFC, decreasing = TRUE), ]
  head(sorted_full_results_m4)
}
if(METHOD_FEATURE_FLAG == 4){
  library(ggplot2)
  ggplot(full_results_m4,aes(x = logFC, y=B)) + geom_point()
}

Now, let’s visualize the plot with the cutoff

if(METHOD_FEATURE_FLAG == 4){
  library(dplyr)
  library(ggrepel)
  p_cutoff <- 0.1
  fc_cutoff <- 0.01
  topN <- 20

  full_results_m4 <- full_results_m4 %>%
      mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
      mutate(Rank = rank(-abs(logFC)), 
             Label = ifelse(Rank <= topN, as.character(ID), ""))

  ggplot(full_results_m4, aes(x = logFC, 
                              y = B, col = Significant, label = Label)) +
    geom_point() +
    geom_text_repel(col = "black")
}

Now, let’s change the y-axis to P value

if(METHOD_FEATURE_FLAG == 4){
  ggplot(full_results_m4,aes(x = logFC, y=-log10(P.Value))) + geom_point()
}
if(METHOD_FEATURE_FLAG == 4){
  library(dplyr)
  library(ggrepel)
  p_cutoff <- 0.1
  fc_cutoff <- 0.01
  topN <- 20

  full_results_m4 <- full_results_m4 %>%
      mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
      mutate(Rank = rank(-abs(logFC)), 
             Label = ifelse(Rank <= topN, as.character(ID), ""))

  ggplot(full_results_m4, 
         aes(x = logFC, y = -log10(P.Value), 
             col = Significant, 
             label = Label)) +
    geom_point() +
    geom_text_repel(col = "black")
}
Use “Recipe” - - Process Data
if(METHOD_FEATURE_FLAG == 4){
  
  library(recipes)

 
  rec <- recipe(DX ~ ., data = df_picked_m4) %>%
    step_zv(all_predictors()) %>%
    # step_range(all_numeric(), -all_outcomes()) %>%
    step_dummy(all_nominal(), -all_outcomes())%>%
    step_corr(all_predictors(), threshold = 0.7)

  rec_prep <- prep(rec, df_picked_m4)

  processed_data_m4 <- bake(rec_prep, new_data = df_picked_m4)
  processed_data_m4_df <- as.data.frame(processed_data_m4)
  rownames(processed_data_m4_df) <- rownames(df_picked_m4)
  print(dim(processed_data_m4))
}
if(METHOD_FEATURE_FLAG == 4){
  AfterProcess_FeatureName_m4<-colnames(processed_data_m4)
  print(length(AfterProcess_FeatureName_m4))
  head(AfterProcess_FeatureName_m4)
  tail(AfterProcess_FeatureName_m4)
}
if(METHOD_FEATURE_FLAG == 4){
  levels(df_picked_m4$DX)
}
if(METHOD_FEATURE_FLAG == 4){
  lastColumn_NUM_m4<-dim(processed_data_m4)[2]
  last5Column_NUM_m4<-lastColumn_NUM_m4-5
  head(processed_data_m4[,last5Column_NUM_m4 :lastColumn_NUM_m4])
}
if(METHOD_FEATURE_FLAG == 4){
  print(levels(processed_data_m4$DX))
  print(dim(processed_data_m4))
}

(5) Method Five - CN vs MCI

In this method, only CN and AD class will be considered.

if(METHOD_FEATURE_FLAG == 5){
  
  df_fs_method5<-clean_merged_df

}
Picking Features
if(METHOD_FEATURE_FLAG == 5){
  phenotic_features_m5<-c(
    "DX","age.now","PTGENDER","PC1","PC2","PC3")
  pickedFeatureName_m5<-c(phenotic_features_m5,featureName_CpGs)
  df_picked_m5<-df_fs_method5[,pickedFeatureName_m5]

  df_picked_m5$DX<-as.factor(df_picked_m5$DX)
  df_picked_m5$PTGENDER<-as.factor(df_picked_m5$PTGENDER)
  head(df_picked_m5[,1:5],n=3)
}
if(METHOD_FEATURE_FLAG == 5){
  dim(df_picked_m5)
}
Filter and Change to Classification with ‘CN vs MCI’
if(METHOD_FEATURE_FLAG == 5){
  df_picked_m5<-df_picked_m5 %>%  filter(DX != "Dementia") %>% droplevels()

  
  df_picked_m5$DX<-as.factor(df_picked_m5$DX)
  df_picked_m5$PTGENDER<-as.factor(df_picked_m5$PTGENDER)

  head(df_picked_m5[1:10],n=3)

}
if(METHOD_FEATURE_FLAG == 5){
  print(dim(df_picked_m5))
  print(table(df_picked_m5$DX))
}
if(METHOD_FEATURE_FLAG == 5){
  df_fs_method5 <- df_fs_method5 %>%  filter(DX != "Dementia") %>% droplevels()
  df_fs_method5$DX<-as.factor(df_fs_method5$DX)
  print(head(df_fs_method5))
  print(dim(df_fs_method5))
}
Perform DMP - Use LIMMA
if(METHOD_FEATURE_FLAG == 5){
  pheno_data_m5 <- df_picked_m5[,phenotic_features_m5] 
  print(head(pheno_data_m5[,1:5],n=3))

  design_m5 <- model.matrix(~0 + .,data=pheno_data_m5)

  colnames(design_m5)[colnames(design_m5) == "DXCN"] <- "CN"
  colnames(design_m5)[colnames(design_m5) == "DXMCI"] <- "MCI"

  print(head(design_m5))

  beta_values_m5 <- t(as.matrix(df_fs_method5[,featureName_CpGs]))

}

In order to perform the differential analysis - Differentially Methylated Position (DMP), we have to define the contrast that we are interested in. In this method 5, we focus on two groups (CN and MCI), one contrast of interest.

if(METHOD_FEATURE_FLAG == 5){

  fit_m5 <- lmFit(beta_values_m5, design_m5)
  head(fit_m5$coefficients)


  contrast.matrix <- makeContrasts(MCI - CN, levels = design_m5)
 
  fit2_m5 <- contrasts.fit(fit_m5, contrast.matrix)

  # Apply the empirical Bayes’ step to get our differential expression statistics and p-values.

  fit2_m5 <- eBayes(fit2_m5)
}
if(METHOD_FEATURE_FLAG == 5){
  decideTests(fit2_m5)
}
if(METHOD_FEATURE_FLAG == 5){
  dmp_results_m5_try1 <- decideTests(
    fit2_m5, lfc = 0.01, adjust.method = "fdr", p.value = 0.1)
  table(dmp_results_m5_try1)

}

The constraints is too tight, let’s smooth the constraint.

if(METHOD_FEATURE_FLAG == 5){
  # Identify DMPs, we will use this one:
  dmp_results_m5 <- decideTests(
    fit2_m5, lfc = 0.01, adjust.method = "none", p.value = 0.1)

  table(dmp_results_m5)
}
Final used CpGs after DMP
if(METHOD_FEATURE_FLAG == 5){

  significant_dmp_filter <- dmp_results_m5 != 0 
  significant_cpgs_m5_DMP <- rownames(dmp_results_m5)[
    apply(significant_dmp_filter, 1, any)]

  pickedFeatureName_m5_afterDMP<-c(phenotic_features_m5,significant_cpgs_m5_DMP)
  df_picked_m5<-df_picked_m5[,pickedFeatureName_m5_afterDMP]

  dim(df_picked_m5)
}
Visualize with the results of DMP

The “Volcano Plot”is one way to visualize the results of a DE analysis.

X - axis shows the log-fold change in methylation levels between two classes. The Log Fold Change (LogFC) can be calculated as \(\log_2 \left( \frac{\text{mean}(\text{Group1})}{\text{mean}(\text{Group2})} \right)\).

Interpretation of logFC:

  • Positive LogFC: Indicates that the measurement is higher in the first group compared to the second group, here means hypermethylation (increase in methylation).

  • Negative LogFC: Indicates that the measurement is lower in the first group compared to the second group, here means hypomethylation (decrease in methylation) in the experimental condition compared to the reference.

  • LogFC of 0: Indicates no difference in the measurement between the two groups.

Y - axis shows some measure of statistical significance, like the log-odds, or “B” statistic. In the following, we will use B statistics. The log-odds can be calculated by \(B = \log_e(\text{posterior odds})\).

Interpretation of B-value:

  • Higher B-value: Indicates stronger evidence for differential methylation.

  • Lower (or negative) B-value: Indicates weaker evidence for differential methylation.

  • B-value close to zero: Indicates uncertainty or lack of strong evidence for differential methylation.

A characteristic “volcano” shape should be seen. Let’s look at the results:

if(METHOD_FEATURE_FLAG == 5){
  full_results_m5 <- topTable(fit2_m5, number=Inf)
  full_results_m5 <- tibble::rownames_to_column(full_results_m5,"ID")
  head(full_results_m5)
}
if(METHOD_FEATURE_FLAG == 5){
  sorted_full_results_m5 <- full_results_m5[
    order(full_results_m5$logFC, decreasing = TRUE), ]
  head(sorted_full_results_m5)
}
if(METHOD_FEATURE_FLAG == 5){
  library(ggplot2)
  ggplot(full_results_m5,aes(x = logFC, y=B)) + geom_point()
}

Now, let’s visualize the plot with the cutoff

if(METHOD_FEATURE_FLAG == 5){
  library(dplyr)
  library(ggrepel)
  p_cutoff <- 0.1
  fc_cutoff <- 0.01
  topN <- 20

  full_results_m5 <- full_results_m5 %>%
      mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
      mutate(Rank = rank(-abs(logFC)), 
             Label = ifelse(Rank <= topN, as.character(ID), ""))

  ggplot(full_results_m5, aes(x = logFC, 
                              y = B, col = Significant, label = Label)) +
    geom_point() +
    geom_text_repel(col = "black")
}

Now, let’s change the y-axis to P value

if(METHOD_FEATURE_FLAG == 5){
  ggplot(full_results_m5,aes(x = logFC, y=-log10(P.Value))) + geom_point()
}
if(METHOD_FEATURE_FLAG == 5){
  library(dplyr)
  library(ggrepel)
  p_cutoff <- 0.1
  fc_cutoff <- 0.01
  topN <- 20

  full_results_m5 <- full_results_m5 %>%
      mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
      mutate(Rank = rank(-abs(logFC)), 
             Label = ifelse(Rank <= topN, as.character(ID), ""))

  ggplot(full_results_m5, 
         aes(x = logFC, y = -log10(P.Value), 
             col = Significant, 
             label = Label)) +
    geom_point() +
    geom_text_repel(col = "black")
}
Use “Recipe” - - Process Data
if(METHOD_FEATURE_FLAG == 5){
  
  library(recipes)

 
  rec <- recipe(DX ~ ., data = df_picked_m5) %>%
    step_zv(all_predictors()) %>%
    # step_range(all_numeric(), -all_outcomes()) %>%
    step_dummy(all_nominal(), -all_outcomes())%>%
    step_corr(all_predictors(), threshold = 0.7)

  rec_prep <- prep(rec, df_picked_m5)

  processed_data_m5 <- bake(rec_prep, new_data = df_picked_m5)
  processed_data_m5_df <- as.data.frame(processed_data_m5)
  rownames(processed_data_m5_df) <- rownames(df_picked_m5)
  print(dim(processed_data_m5))
}
if(METHOD_FEATURE_FLAG == 5){
  AfterProcess_FeatureName_m5<-colnames(processed_data_m5)
  print(length(AfterProcess_FeatureName_m5))
  head(AfterProcess_FeatureName_m5)
  tail(AfterProcess_FeatureName_m5)
}
if(METHOD_FEATURE_FLAG == 5){
  levels(df_picked_m5$DX)
}
if(METHOD_FEATURE_FLAG == 5){
  lastColumn_NUM_m5<-dim(processed_data_m5)[2]
  last5Column_NUM_m5<-lastColumn_NUM_m5-5
  head(processed_data_m5[,last5Column_NUM_m5 :lastColumn_NUM_m5])
}
if(METHOD_FEATURE_FLAG == 5){
  print(levels(processed_data_m5$DX))
  print(dim(processed_data_m5))
}

(6) Method Six - MCI vs AD (Dementia)

In this method, only CN and AD class will be considered.

if(METHOD_FEATURE_FLAG == 6){
  
  df_fs_method6<-clean_merged_df

}
Picking Features
if(METHOD_FEATURE_FLAG == 6){
  phenotic_features_m6<-c(
    "DX","age.now","PTGENDER","PC1","PC2","PC3")
  pickedFeatureName_m6<-c(phenotic_features_m6,featureName_CpGs)
  df_picked_m6<-df_fs_method6[,pickedFeatureName_m6]

  df_picked_m6$DX<-as.factor(df_picked_m6$DX)
  df_picked_m6$PTGENDER<-as.factor(df_picked_m6$PTGENDER)
  head(df_picked_m6[,1:5],n=3)
}
if(METHOD_FEATURE_FLAG == 6){
  dim(df_picked_m6)
}
Filter and Change to Classification with ‘MCI vs Dementia’
if(METHOD_FEATURE_FLAG == 6){
  df_picked_m6<-df_picked_m6 %>%  filter(DX != "CN") %>% droplevels()

  
  df_picked_m6$DX<-as.factor(df_picked_m6$DX)
  df_picked_m6$PTGENDER<-as.factor(df_picked_m6$PTGENDER)

  head(df_picked_m6[1:10],n=3)

}
if(METHOD_FEATURE_FLAG == 6){
  print(dim(df_picked_m6))
  print(table(df_picked_m6$DX))
}
if(METHOD_FEATURE_FLAG == 6){
  df_fs_method6 <- df_fs_method6 %>%  filter(DX != "CN") %>% droplevels()
  df_fs_method6$DX<-as.factor(df_fs_method6$DX)
  print(head(df_fs_method6))
  print(dim(df_fs_method6))
}
Perform DMP - Use LIMMA
if(METHOD_FEATURE_FLAG == 6){
  pheno_data_m6 <- df_picked_m6[,phenotic_features_m6] 
  print(head(pheno_data_m6[,1:5],n=3))

  design_m6 <- model.matrix(~0 + .,data=pheno_data_m6)

  colnames(design_m6)[colnames(design_m6) == "DXDementia"] <- "Dementia"
  colnames(design_m6)[colnames(design_m6) == "DXMCI"] <- "MCI"

  print(head(design_m6))

  beta_values_m6 <- t(as.matrix(df_fs_method6[,featureName_CpGs]))

}

In order to perform the differential analysis - Differentially Methylated Position (DMP), we have to define the contrast that we are interested in. In this method 6, we focus on two groups (MCI and Dementia), one contrast of interest.

if(METHOD_FEATURE_FLAG == 6){

  fit_m6 <- lmFit(beta_values_m6, design_m6)
  head(fit_m6$coefficients)


  contrast.matrix <- makeContrasts(MCI - Dementia, levels = design_m6)
 
  fit2_m6 <- contrasts.fit(fit_m6, contrast.matrix)

  # Apply the empirical Bayes’ step to get our differential expression statistics and p-values.

  fit2_m6 <- eBayes(fit2_m6)
}
if(METHOD_FEATURE_FLAG == 6){
  decideTests(fit2_m6)
}
if(METHOD_FEATURE_FLAG == 6){
  dmp_results_m6_try1 <- decideTests(
    fit2_m6, lfc = 0.01, adjust.method = "fdr", p.value = 0.1)
  table(dmp_results_m6_try1)

}

The constraints is too tight, let’s smooth the constraint.

if(METHOD_FEATURE_FLAG == 6){
  # Identify DMPs, we will use this one:
  dmp_results_m6 <- decideTests(
    fit2_m6, lfc = 0.01, adjust.method = "none", p.value = 0.1)

  table(dmp_results_m6)
}
Final used CpGs after DMP
if(METHOD_FEATURE_FLAG == 6){

  significant_dmp_filter <- dmp_results_m6 != 0 
  significant_cpgs_m6_DMP <- rownames(dmp_results_m6)[
    apply(significant_dmp_filter, 1, any)]

  pickedFeatureName_m6_afterDMP<-c(phenotic_features_m6,significant_cpgs_m6_DMP)
  df_picked_m6<-df_picked_m6[,pickedFeatureName_m6_afterDMP]

  dim(df_picked_m6)
}
Visualize with the results of DMP

The “Volcano Plot”is one way to visualize the results of a DE analysis.

X - axis shows the log-fold change in methylation levels between two classes. The Log Fold Change (LogFC) can be calculated as \(\log_2 \left( \frac{\text{mean}(\text{Group1})}{\text{mean}(\text{Group2})} \right)\).

Interpretation of logFC:

  • Positive LogFC: Indicates that the measurement is higher in the first group compared to the second group, here means hypermethylation (increase in methylation).

  • Negative LogFC: Indicates that the measurement is lower in the first group compared to the second group, here means hypomethylation (decrease in methylation) in the experimental condition compared to the reference.

  • LogFC of 0: Indicates no difference in the measurement between the two groups.

Y - axis shows some measure of statistical significance, like the log-odds, or “B” statistic. In the following, we will use B statistics. The log-odds can be calculated by \(B = \log_e(\text{posterior odds})\).

Interpretation of B-value:

  • Higher B-value: Indicates stronger evidence for differential methylation.

  • Lower (or negative) B-value: Indicates weaker evidence for differential methylation.

  • B-value close to zero: Indicates uncertainty or lack of strong evidence for differential methylation.

A characteristic “volcano” shape should be seen. Let’s look at the results:

if(METHOD_FEATURE_FLAG == 6){
  full_results_m6 <- topTable(fit2_m6, number=Inf)
  full_results_m6 <- tibble::rownames_to_column(full_results_m6,"ID")
  head(full_results_m6)
}
if(METHOD_FEATURE_FLAG == 6){
  sorted_full_results_m6 <- full_results_m6[
    order(full_results_m6$logFC, decreasing = TRUE), ]
  head(sorted_full_results_m6)
}
if(METHOD_FEATURE_FLAG == 6){
  library(ggplot2)
  ggplot(full_results_m6,aes(x = logFC, y=B)) + geom_point()
}

Now, let’s visualize the plot with the cutoff

if(METHOD_FEATURE_FLAG == 6){
  library(dplyr)
  library(ggrepel)
  p_cutoff <- 0.1
  fc_cutoff <- 0.01
  topN <- 20

  full_results_m6 <- full_results_m6 %>%
      mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
      mutate(Rank = rank(-abs(logFC)), 
             Label = ifelse(Rank <= topN, as.character(ID), ""))

  ggplot(full_results_m6, aes(x = logFC, 
                              y = B, col = Significant, label = Label)) +
    geom_point() +
    geom_text_repel(col = "black")
}

Now, let’s change the y-axis to P value

if(METHOD_FEATURE_FLAG == 6){
  ggplot(full_results_m6,aes(x = logFC, y=-log10(P.Value))) + geom_point()
}
if(METHOD_FEATURE_FLAG == 6){
  library(dplyr)
  library(ggrepel)
  p_cutoff <- 0.1
  fc_cutoff <- 0.01
  topN <- 20

  full_results_m6 <- full_results_m6 %>%
      mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
      mutate(Rank = rank(-abs(logFC)), 
             Label = ifelse(Rank <= topN, as.character(ID), ""))

  ggplot(full_results_m6, 
         aes(x = logFC, y = -log10(P.Value), 
             col = Significant, 
             label = Label)) +
    geom_point() +
    geom_text_repel(col = "black")
}
Use “Recipe” - - Process Data
if(METHOD_FEATURE_FLAG == 6){
  
  library(recipes)

 
  rec <- recipe(DX ~ ., data = df_picked_m6) %>%
    step_zv(all_predictors()) %>%
    # step_range(all_numeric(), -all_outcomes()) %>%
    step_dummy(all_nominal(), -all_outcomes())%>%
    step_corr(all_predictors(), threshold = 0.7)

  rec_prep <- prep(rec, df_picked_m6)

  processed_data_m6 <- bake(rec_prep, new_data = df_picked_m6)
  processed_data_m6_df <- as.data.frame(processed_data_m6)
  rownames(processed_data_m6_df) <- rownames(df_picked_m6)
  print(dim(processed_data_m6))
}
if(METHOD_FEATURE_FLAG == 6){
  AfterProcess_FeatureName_m6<-colnames(processed_data_m6)
  print(length(AfterProcess_FeatureName_m6))
  head(AfterProcess_FeatureName_m6)
  tail(AfterProcess_FeatureName_m6)
}
if(METHOD_FEATURE_FLAG == 6){
  levels(df_picked_m6$DX)
}
if(METHOD_FEATURE_FLAG == 6){
  lastColumn_NUM_m6<-dim(processed_data_m6)[2]
  last5Column_NUM_m6<-lastColumn_NUM_m6-5
  head(processed_data_m6[,last5Column_NUM_m6 :lastColumn_NUM_m6])
}
if(METHOD_FEATURE_FLAG == 6){
  print(levels(processed_data_m6$DX))
  print(dim(processed_data_m6))
}

1.7 INPUT - Model Train

name for “processed_data” could be :

  1. “processed_data_m1”, which uses method one to process the data

  2. “processed_data_m2”, which uses method two to process the data, notice that the feature will be principle component.

  3. “processed_data_m3”, which uses method three to process the data. This method is Transfer the “DX” to Binary Class. “CN” stays same, and “MCI”,“Dementia” will be transfer to “CI”.

    Comment here is “processed_data_m3_df” is the data frame format of “processed_data_m3” with sample names as row names.

  4. “processed_data_m4”, which uses method four to process the data. This method is filtering the “DX”(drop “MCI” class), limited to CN and Dementia (AD) Classes.

  5. “processed_data_m5”, which uses method five to process the data. This method is filtering the “DX”(drop “Dementia” class), limited to CN and MCI Classes.

  6. “processed_data_m6”, which uses method six to process the data. This method is filtering the “DX”(drop “CN” class), limited to MCI and Dementia Classes.

name for “AfterProcess_FeatureName” (include “DX” label) could be :

  1. “AfterProcess_FeatureName_m1”, which is column name of processed dataframe with method one.
  2. “AfterProcess_FeatureName_m2”, which is column name of principle component method.
  3. “AfterProcess_FeatureName_m3”, which is column name of processed dataframe with method three. This method is Transfer the “DX” to Binary Class. “CN” stays same, and “MCI”,“Dementia” will be transfer to “CI”.
  4. “AfterProcess_FeatureName_m4”, which is column name of processed dataframe with method four. This method is filtering the “DX”(drop “MCI” class), limited to CN and Dementia (AD) Classes.
  5. “AfterProcess_FeatureName_m5”, which is column name of processed dataframe with method five. This method is filtering the “DX”(drop “Dementia” class), limited to CN and MCI Classes.
  6. “AfterProcess_FeatureName_m6”, which is column name of processed dataframe with method six. This method is filtering the “DX”(drop “CN” class), limited to MCI and Dementia Classes.
if(METHOD_FEATURE_FLAG==1){
  
  processed_dataFrame<-processed_data_m1_df
  processed_data<-processed_data_m1

  AfterProcess_FeatureName<-AfterProcess_FeatureName_m1

  
}


if(METHOD_FEATURE_FLAG==2){
  
  processed_dataFrame<-processed_data_m2_df
  processed_data<-processed_data_m2

  AfterProcess_FeatureName<-AfterProcess_FeatureName_m2

  
}

if(METHOD_FEATURE_FLAG==3){
  
  processed_dataFrame<-processed_data_m3_df
  processed_data<-processed_data_m3

  AfterProcess_FeatureName<-AfterProcess_FeatureName_m3

  
}

if(METHOD_FEATURE_FLAG==4){
  
  processed_dataFrame<-processed_data_m4_df
  processed_data<-processed_data_m4

  AfterProcess_FeatureName<-AfterProcess_FeatureName_m4

  
}

if(METHOD_FEATURE_FLAG==5){
  
  processed_dataFrame<-processed_data_m5_df
  processed_data<-processed_data_m5

  AfterProcess_FeatureName<-AfterProcess_FeatureName_m5

  
}

if(METHOD_FEATURE_FLAG==6){
  
  processed_dataFrame<-processed_data_m6_df
  processed_data<-processed_data_m6

  AfterProcess_FeatureName<-AfterProcess_FeatureName_m6

  
}
print(head(processed_dataFrame))
##                      age.now          PC1           PC2          PC3 cg02621446 cg23916408
## 200223270003_R02C01 82.40000 -0.214185447  1.470293e-02 -0.014043316  0.8731313  0.1942275
## 200223270003_R03C01 78.60000 -0.172761185  5.745834e-02  0.005055871  0.8095534  0.9154993
## 200223270003_R06C01 80.40000 -0.003667305  8.372861e-02  0.029143653  0.7511582  0.8886255
## 200223270003_R07C01 78.16441 -0.186779607 -1.117250e-02 -0.032302430  0.8773609  0.8872447
## 200223270006_R01C01 62.90000  0.026814649  1.650735e-05  0.052947950  0.2046541  0.2219945
## 200223270006_R04C01 80.67796 -0.037862929  1.571950e-02 -0.008685676  0.7963817  0.1520624
##                     cg12146221 cg05234269 cg14293999 cg19377607 cg14307563 cg21209485
## 200223270003_R02C01  0.2049284 0.93848584  0.2836710 0.05377464  0.1855966  0.8865053
## 200223270003_R03C01  0.1814927 0.57461229  0.9172023 0.90570746  0.8916957  0.8714878
## 200223270003_R06C01  0.8619250 0.02467208  0.9168166 0.06636174  0.8750052  0.2292550
## 200223270003_R07C01  0.1238469 0.56516794  0.9188336 0.68788639  0.8975663  0.2351526
## 200223270006_R01C01  0.2021598 0.94829529  0.1971116 0.06338988  0.8762842  0.8882046
## 200223270006_R04C01  0.1383786 0.56298286  0.9030919 0.91551446  0.9168614  0.2292483
##                     cg11331837 cg11187460 cg14564293 cg12012426 cg00999469 cg17421046
## 200223270003_R02C01 0.03692842 0.03672179 0.52089591  0.9165048  0.3274080  0.9026993
## 200223270003_R03C01 0.57150125 0.92516409 0.04000662  0.9434768  0.2857719  0.9112100
## 200223270003_R06C01 0.03182862 0.03109553 0.04959460  0.9220044  0.2499229  0.8952031
## 200223270003_R07C01 0.03832164 0.53283119 0.03114773  0.9241284  0.2819622  0.9268852
## 200223270006_R01C01 0.93008298 0.54038146 0.51703196  0.9327143  0.2933539  0.1118337
## 200223270006_R04C01 0.54004452 0.91096169 0.51535010  0.9271167  0.2966623  0.4174370
##                     cg27639199 cg24851651 cg16788319 cg25879395 cg18339359 cg12284872
## 200223270003_R02C01 0.67515415 0.03674702  0.9379870 0.88130864  0.8824858  0.8008333
## 200223270003_R03C01 0.67552763 0.05358297  0.8913429 0.02603438  0.9040272  0.7414569
## 200223270003_R06C01 0.06233093 0.05968923  0.8680680 0.91060615  0.8552121  0.7725267
## 200223270003_R07C01 0.05701332 0.60864179  0.8811444 0.89205942  0.3073106  0.7573369
## 200223270006_R01C01 0.05037694 0.08825834  0.3123481 0.47886249  0.8973742  0.7201607
## 200223270006_R04C01 0.08144161 0.91932068  0.2995627 0.02145248  0.2292800  0.8021446
##                     cg24506579 cg05321907 cg10985055 cg20139683 cg10750306 cg01667144
## 200223270003_R02C01  0.5244337  0.2880477  0.8518169  0.8717075 0.04919915  0.8971484
## 200223270003_R03C01  0.5794845  0.1782629  0.8631895  0.9059433 0.55160081  0.3175389
## 200223270003_R06C01  0.9427785  0.8427929  0.5456633  0.8962554 0.54694332  0.9238364
## 200223270003_R07C01  0.9323844  0.8320504  0.8825100  0.9218012 0.59824543  0.8739442
## 200223270006_R01C01  0.9185355  0.2422218  0.8841690  0.1708472 0.53158639  0.2931961
## 200223270006_R04C01  0.4332642  0.2429551  0.8407797  0.1067122 0.05646838  0.8616530
##                     cg27341708 cg12466610 cg03327352 cg02320265 cg08779649 cg13885788
## 200223270003_R02C01 0.48846610 0.05767659  0.8851712  0.8853213 0.44449401  0.9380618
## 200223270003_R03C01 0.02613847 0.59131778  0.8786878  0.4686314 0.45076825  0.9369476
## 200223270003_R06C01 0.86893582 0.06939623  0.3042310  0.4838749 0.04810217  0.5163017
## 200223270003_R07C01 0.02642300 0.04527733  0.8273211  0.8986848 0.42715969  0.9183376
## 200223270006_R01C01 0.47573455 0.05212904  0.8774082  0.8987560 0.89313476  0.5525542
## 200223270006_R04C01 0.89411974 0.05104033  0.8829492  0.4768520 0.59523771  0.9328289
##                     cg25561557 cg01413796 cg26069044  cg03088219 cg12682323 cg17738613
## 200223270003_R02C01 0.76736369  0.1345128 0.92401867 0.844002862  0.9397956  0.6879612
## 200223270003_R03C01 0.03851635  0.2830672 0.94072227 0.007435243  0.9003940  0.6582258
## 200223270003_R06C01 0.47259480  0.8194681 0.93321315 0.120155222  0.9157877  0.1022257
## 200223270003_R07C01 0.43364249  0.9007710 0.56567694 0.826554308  0.9048877  0.8960156
## 200223270006_R01C01 0.46211439  0.2603027 0.94369927 0.066294915  0.1065347  0.8850702
## 200223270006_R04C01 0.44651530  0.9207672 0.02040391 0.574738383  0.8836232  0.8481916
##                     cg17186592 cg17906851 cg01933473 cg16771215 cg11438323 cg27086157
## 200223270003_R02C01  0.9230463  0.9488392  0.2589014 0.88389723  0.4863471  0.9224112
## 200223270003_R03C01  0.8593448  0.9529718  0.6726133 0.07196933  0.8984559  0.9219304
## 200223270003_R06C01  0.8467599  0.6462151  0.2642560 0.09949974  0.8722772  0.3224986
## 200223270003_R07C01  0.4986373  0.9553497  0.1978068 0.64234023  0.5026756  0.3455486
## 200223270006_R01C01  0.8978999  0.6222117  0.7599441 0.62679274  0.8809646  0.8988962
## 200223270006_R04C01  0.9239750  0.6441202  0.7405661 0.06970175  0.8717937  0.9159217
##                     cg15535896 cg18821122 cg05841700 cg10738648 cg16579946 cg20370184
## 200223270003_R02C01  0.3382952  0.9291309  0.2923544 0.44931577  0.6306315 0.37710950
## 200223270003_R03C01  0.9253926  0.5901603  0.9146488 0.49894016  0.6648766 0.05737964
## 200223270003_R06C01  0.3320191  0.5779620  0.3737990 0.05552024  0.6455081 0.04740505
## 200223270003_R07C01  0.9409104  0.9251431  0.5046468 0.03730440  0.8979650 0.83572095
## 200223270006_R01C01  0.9326027  0.9217018  0.8419031 0.54952781  0.6886498 0.04056608
## 200223270006_R04C01  0.9156401  0.5412250  0.9286652 0.59358167  0.6766907 0.04038589
##                     cg12784167 cg15633912 cg02494911 cg21854924 cg25436480 cg12534577
## 200223270003_R02C01 0.81503498  0.1605530  0.3049435  0.8729132 0.84251599  0.8585231
## 200223270003_R03C01 0.02811410  0.9333421  0.2416332  0.7162342 0.49940321  0.8493466
## 200223270003_R06C01 0.03073269  0.8737362  0.2520909  0.7520990 0.34943119  0.8395241
## 200223270003_R07C01 0.84775699  0.9137334  0.2457032  0.8641284 0.85244913  0.8511384
## 200223270006_R01C01 0.83825789  0.9169706  0.8045030  0.6498895 0.44545117  0.8804655
## 200223270006_R04C01 0.45475291  0.8890004  0.7489283  0.5943113 0.02575036  0.3029013
##                     cg15865722 cg06864789 cg24859648 cg16178271 cg00675157 cg10369879
## 200223270003_R02C01 0.89438595 0.05369415 0.83777536  0.6445416  0.9188438  0.9218784
## 200223270003_R03C01 0.90194372 0.46053125 0.44392797  0.6178075  0.9242325  0.3149306
## 200223270003_R06C01 0.92118977 0.87513655 0.03341185  0.6641952  0.9254708  0.9141081
## 200223270003_R07C01 0.09230759 0.49020327 0.43582347  0.7148058  0.5447244  0.9054415
## 200223270006_R01C01 0.93422668 0.47852685 0.03087161  0.6138954  0.5173554  0.2917862
## 200223270006_R04C01 0.92220002 0.05423587 0.02588024  0.9414188  0.9247232  0.9200403
##                     cg22274273 cg01128042 cg08198851 cg04412904 cg11227702 cg20913114
## 200223270003_R02C01  0.4209386  0.9113420  0.6578905 0.05088595 0.86486075 0.36510482
## 200223270003_R03C01  0.4246379  0.5328806  0.6578186 0.07717659 0.49184121 0.80382984
## 200223270003_R06C01  0.4196796  0.5222757  0.1272153 0.08253743 0.02543724 0.03158439
## 200223270003_R07C01  0.4164100  0.5141721  0.8351465 0.06217431 0.45150971 0.81256840
## 200223270006_R01C01  0.7951105  0.9321215  0.8791156 0.11888769 0.89086877 0.81502059
## 200223270006_R04C01  0.0229810  0.5050081  0.1423737 0.08885846 0.87675947 0.90468830
##                     cg02932958 cg00962106 cg15775217 cg21697769 cg16715186 cg00696044
## 200223270003_R02C01  0.7901008  0.9124898  0.5707441  0.8946108  0.2742789 0.55608424
## 200223270003_R03C01  0.4210489  0.5375751  0.9168327  0.2822953  0.7946153 0.07552381
## 200223270003_R06C01  0.3825995  0.5040948  0.6042521  0.8698740  0.8124316 0.79270858
## 200223270003_R07C01  0.7617081  0.9039029  0.9062231  0.9134887  0.7773263 0.03548419
## 200223270006_R01C01  0.8431126  0.8961556  0.9083515  0.2683820  0.8334531 0.10714386
## 200223270006_R04C01  0.7610084  0.8857597  0.6383270  0.2765740  0.8039945 0.18420803
##                     cg12738248 cg01013522 cg00616572 cg05096415 cg01153376 cg09854620
## 200223270003_R02C01 0.85430866  0.6251168  0.9335067  0.9182527  0.4872148  0.5220587
## 200223270003_R03C01 0.88010292  0.8862821  0.9214079  0.5177819  0.9639670  0.8739646
## 200223270003_R06C01 0.51121855  0.5425308  0.9113633  0.6288426  0.2242410  0.8973149
## 200223270003_R07C01 0.09131476  0.8429862  0.9160238  0.6060271  0.5155654  0.8958863
## 200223270006_R01C01 0.91529345  0.0480531  0.4861334  0.5599588  0.9588916  0.9075331
## 200223270006_R04C01 0.91911405  0.8240222  0.9067928  0.5441200  0.9586876  0.9318820
##                     cg24861747 cg19512141 cg06378561 cg02356645 cg02372404 cg06950937
## 200223270003_R02C01  0.3540897  0.8209161  0.9389306  0.5105903 0.03598249  0.8910968
## 200223270003_R03C01  0.4309505  0.7903543  0.9377503  0.5833923 0.02767285  0.2889345
## 200223270003_R06C01  0.8071462  0.8404684  0.5154019  0.5701428 0.03127855  0.9143801
## 200223270003_R07C01  0.3347317  0.2202759  0.9403569  0.5683381 0.55685785  0.8891079
## 200223270006_R01C01  0.3544795  0.8059589  0.4956816  0.5233692 0.02587736  0.8868617
## 200223270006_R04C01  0.5997840  0.7020247  0.9268832  0.9188670 0.02828648  0.9093273
##                     cg00272795 cg03071582 cg08584917 cg23161429 cg07138269 cg13080267
## 200223270003_R02C01 0.46365138  0.9187811  0.5663205  0.8956965  0.5002290 0.78936656
## 200223270003_R03C01 0.82839260  0.5844421  0.9019732  0.9099619  0.9426707 0.78371483
## 200223270003_R06C01 0.07231279  0.6245558  0.9187789  0.8833895  0.5057781 0.09436069
## 200223270003_R07C01 0.78303831  0.9283683  0.6007449  0.9134709  0.9400527 0.09351259
## 200223270006_R01C01 0.78219952  0.5715416  0.9069098  0.8738558  0.9321602 0.45173796
## 200223270006_R04C01 0.44408249  0.6534650  0.9263584  0.9104210  0.9333501 0.49866715
##                     cg25758034 cg23658987 cg25259265 cg14924512 cg14710850 cg06118351
## 200223270003_R02C01  0.6114028 0.79757644  0.4356646  0.5303907  0.8048592 0.36339400
## 200223270003_R03C01  0.6649219 0.07511718  0.8893591  0.9160885  0.8090950 0.47148604
## 200223270003_R06C01  0.2393844 0.10177571  0.4201700  0.9088414  0.8285902 0.86559618
## 200223270003_R07C01  0.7071501 0.46747992  0.4455517  0.9081681  0.8336457 0.83494303
## 200223270006_R01C01  0.2301078 0.76831297  0.8423337  0.9111789  0.8500725 0.02632111
## 200223270006_R04C01  0.6891513 0.08988532  0.8460736  0.5331753  0.8207247 0.83329300
##                     cg07480176 cg08857872 cg20678988 cg24873924 cg01921484 cg12776173
## 200223270003_R02C01  0.5171664  0.3395280  0.8438718  0.3060635 0.90985496 0.10388038
## 200223270003_R03C01  0.3760452  0.8181845  0.8548886  0.8640985 0.90931369 0.87306345
## 200223270003_R06C01  0.6998389  0.2970779  0.7786685  0.8259149 0.92044873 0.70094907
## 200223270003_R07C01  0.2189042  0.2954090  0.8260541  0.8333940 0.91674311 0.11367159
## 200223270006_R01C01  0.5570021  0.8935876  0.3295384  0.8761177 0.02943747 0.09458405
## 200223270006_R04C01  0.4501196  0.8901338  0.8541667  0.8585363 0.89057041 0.86532175
##                     cg00247094 cg03084184 cg01549082 cg26948066 cg07523188 cg26474732
## 200223270003_R02C01  0.5399349  0.8162981  0.2924138  0.4685225  0.7509183  0.7843252
## 200223270003_R03C01  0.9315640  0.7877128  0.7065693  0.5026045  0.1524386  0.8184088
## 200223270003_R06C01  0.5177874  0.4546397  0.2895440  0.9101976  0.7127592  0.7358417
## 200223270003_R07C01  0.5377765  0.7812413  0.6422955  0.9379543  0.8464983  0.7509296
## 200223270006_R01C01  0.9109309  0.7818230  0.8471236  0.9120181  0.7847738  0.8294938
## 200223270006_R04C01  0.5266535  0.7725853  0.6949888  0.8868608  0.8231277  0.8033167
##                     cg11133939 cg02225060 cg12279734 cg10240127 cg23432430 cg16652920
## 200223270003_R02C01  0.1282694  0.6828159  0.6435368  0.9250553  0.9482702  0.9436000
## 200223270003_R03C01  0.5920898  0.8265195  0.1494651  0.9403255  0.9455418  0.9431222
## 200223270003_R06C01  0.5127706  0.5209552  0.8760759  0.9056974  0.9418716  0.9457161
## 200223270003_R07C01  0.8474176  0.8078889  0.8674214  0.9396217  0.9426559  0.9419785
## 200223270006_R01C01  0.8589133  0.6084903  0.6454450  0.9262370  0.9461736  0.9529417
## 200223270006_R04C01  0.5246557  0.7638781  0.8660058  0.9240497  0.9508404  0.9492648
##                     cg06112204 cg12228670 cg19503462 cg07028768 cg14240646 cg09584650
## 200223270003_R02C01  0.5251592  0.8632174  0.7951675  0.4496851  0.5391334 0.08230254
## 200223270003_R03C01  0.8773488  0.8496212  0.4537684  0.8536078  0.2538363 0.09661586
## 200223270003_R06C01  0.8867975  0.8738949  0.6997359  0.8356936  0.1864902 0.52399749
## 200223270003_R07C01  0.5613799  0.8362189  0.7189778  0.4245893  0.6402007 0.11587211
## 200223270006_R01C01  0.9184122  0.8079694  0.7301755  0.8835151  0.7696079 0.42115185
## 200223270006_R04C01  0.9152514  0.6966666  0.4207207  0.4514661  0.1490028 0.56043178
##                     cg27272246 cg16749614 cg04664583 cg26757229 cg03982462 cg06715136
## 200223270003_R02C01  0.8615873  0.8678741  0.5572814  0.6723726  0.8562777  0.3400192
## 200223270003_R03C01  0.8705287  0.8539348  0.5881190  0.1422661  0.6023731  0.9259109
## 200223270003_R06C01  0.8103777  0.5874127  0.9352717  0.7933794  0.8778458  0.9079807
## 200223270003_R07C01  0.0310881  0.5555391  0.9350230  0.8074830  0.8860227  0.6782105
## 200223270006_R01C01  0.7686536  0.8026346  0.9424588  0.5265692  0.8703107  0.8369052
## 200223270006_R04C01  0.4403542  0.7903978  0.9379537  0.7341953  0.8792860  0.8807568
##                     cg15501526 cg04248279 cg01680303 cg06536614 cg26219488 cg18819889
## 200223270003_R02C01  0.6362531  0.8534976  0.5095174  0.5824474  0.9336638  0.9156157
## 200223270003_R03C01  0.6319253  0.8458854  0.1344941  0.5746694  0.9134707  0.9004455
## 200223270003_R06C01  0.7435100  0.8332786  0.7573869  0.5773468  0.9261878  0.9054439
## 200223270003_R07C01  0.7756577  0.3303204  0.4772204  0.5848917  0.9217866  0.9089935
## 200223270006_R01C01  0.3230777  0.5966878  0.1176263  0.5669919  0.4929692  0.9065397
## 200223270006_R04C01  0.8342695  0.8939599  0.5133033  0.5718514  0.9431574  0.9242767
##                     cg05570109 cg02981548 cg08861434 cg00689685 cg17429539 cg00322003
## 200223270003_R02C01  0.3466611  0.1342571  0.8768306  0.7019389  0.7860900  0.1759911
## 200223270003_R03C01  0.5866750  0.5220037  0.4352647  0.8634268  0.7100923  0.5702070
## 200223270003_R06C01  0.4046471  0.5098965  0.8698813  0.6378795  0.7660838  0.3077122
## 200223270003_R07C01  0.6014355  0.5660985  0.4709249  0.8624541  0.6984969  0.6104341
## 200223270006_R01C01  0.5774881  0.5678714  0.8618532  0.6361891  0.6508597  0.6147419
## 200223270006_R04C01  0.8756826  0.5079859  0.9058965  0.6356260  0.2828452  0.2293759
##                     cg11247378 cg07152869 cg00154902 cg14527649 cg27452255 cg03129555
## 200223270003_R02C01  0.1591185  0.8284151  0.5137741  0.2678912  0.9001010  0.6079616
## 200223270003_R03C01  0.7874849  0.5050630  0.8540746  0.7954683  0.6593379  0.5785498
## 200223270003_R06C01  0.4807942  0.8352490  0.8188126  0.8350610  0.9012217  0.9137818
## 200223270003_R07C01  0.4537348  0.5194300  0.4625776  0.8428684  0.8898635  0.9043041
## 200223270006_R01C01  0.1537079  0.5025709  0.4690086  0.8231348  0.5779792  0.9286357
## 200223270006_R04C01  0.1686356  0.8080916  0.4547219  0.8022444  0.8809143  0.9088564
##                     cg06697310 cg20507276 cg27577781 cg20685672 cg03660162       DX
## 200223270003_R02C01  0.8454609 0.12238910  0.8143535 0.67121006  0.8691767      MCI
## 200223270003_R03C01  0.8653044 0.38721972  0.8113185 0.79320906  0.5160770       CN
## 200223270003_R06C01  0.2405168 0.47978438  0.8144274 0.66136456  0.9026304       CN
## 200223270003_R07C01  0.8479193 0.02261996  0.7970617 0.80838304  0.5305691 Dementia
## 200223270006_R01C01  0.8206613 0.37465798  0.8640044 0.08291414  0.9257451      MCI
## 200223270006_R04C01  0.7839595 0.03570795  0.8840237 0.84460055  0.8935772       CN
print(dim(processed_dataFrame))
## [1] 648 156
print(length(AfterProcess_FeatureName))
## [1] 156
print(head(processed_data))
## # A tibble: 6 × 156
##   age.now      PC1        PC2      PC3 cg02621446 cg23916408 cg12146221 cg05234269 cg14293999
##     <dbl>    <dbl>      <dbl>    <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
## 1    82.4 -0.214    0.0147    -0.0140       0.873      0.194      0.205     0.938       0.284
## 2    78.6 -0.173    0.0575     0.00506      0.810      0.915      0.181     0.575       0.917
## 3    80.4 -0.00367  0.0837     0.0291       0.751      0.889      0.862     0.0247      0.917
## 4    78.2 -0.187   -0.0112    -0.0323       0.877      0.887      0.124     0.565       0.919
## 5    62.9  0.0268   0.0000165  0.0529       0.205      0.222      0.202     0.948       0.197
## 6    80.7 -0.0379   0.0157    -0.00869      0.796      0.152      0.138     0.563       0.903
## # ℹ 147 more variables: cg19377607 <dbl>, cg14307563 <dbl>, cg21209485 <dbl>, cg11331837 <dbl>,
## #   cg11187460 <dbl>, cg14564293 <dbl>, cg12012426 <dbl>, cg00999469 <dbl>, cg17421046 <dbl>,
## #   cg27639199 <dbl>, cg24851651 <dbl>, cg16788319 <dbl>, cg25879395 <dbl>, cg18339359 <dbl>,
## #   cg12284872 <dbl>, cg24506579 <dbl>, cg05321907 <dbl>, cg10985055 <dbl>, cg20139683 <dbl>,
## #   cg10750306 <dbl>, cg01667144 <dbl>, cg27341708 <dbl>, cg12466610 <dbl>, cg03327352 <dbl>,
## #   cg02320265 <dbl>, cg08779649 <dbl>, cg13885788 <dbl>, cg25561557 <dbl>, cg01413796 <dbl>,
## #   cg26069044 <dbl>, cg03088219 <dbl>, cg12682323 <dbl>, cg17738613 <dbl>, cg17186592 <dbl>, …
print(dim(processed_data))
## [1] 648 156
print(AfterProcess_FeatureName)
##   [1] "age.now"    "PC1"        "PC2"        "PC3"        "cg02621446" "cg23916408" "cg12146221"
##   [8] "cg05234269" "cg14293999" "cg19377607" "cg14307563" "cg21209485" "cg11331837" "cg11187460"
##  [15] "cg14564293" "cg12012426" "cg00999469" "cg17421046" "cg27639199" "cg24851651" "cg16788319"
##  [22] "cg25879395" "cg18339359" "cg12284872" "cg24506579" "cg05321907" "cg10985055" "cg20139683"
##  [29] "cg10750306" "cg01667144" "cg27341708" "cg12466610" "cg03327352" "cg02320265" "cg08779649"
##  [36] "cg13885788" "cg25561557" "cg01413796" "cg26069044" "cg03088219" "cg12682323" "cg17738613"
##  [43] "cg17186592" "cg17906851" "cg01933473" "cg16771215" "cg11438323" "cg27086157" "cg15535896"
##  [50] "cg18821122" "cg05841700" "cg10738648" "cg16579946" "cg20370184" "cg12784167" "cg15633912"
##  [57] "cg02494911" "cg21854924" "cg25436480" "cg12534577" "cg15865722" "cg06864789" "cg24859648"
##  [64] "cg16178271" "cg00675157" "cg10369879" "cg22274273" "cg01128042" "cg08198851" "cg04412904"
##  [71] "cg11227702" "cg20913114" "cg02932958" "cg00962106" "cg15775217" "cg21697769" "cg16715186"
##  [78] "cg00696044" "cg12738248" "cg01013522" "cg00616572" "cg05096415" "cg01153376" "cg09854620"
##  [85] "cg24861747" "cg19512141" "cg06378561" "cg02356645" "cg02372404" "cg06950937" "cg00272795"
##  [92] "cg03071582" "cg08584917" "cg23161429" "cg07138269" "cg13080267" "cg25758034" "cg23658987"
##  [99] "cg25259265" "cg14924512" "cg14710850" "cg06118351" "cg07480176" "cg08857872" "cg20678988"
## [106] "cg24873924" "cg01921484" "cg12776173" "cg00247094" "cg03084184" "cg01549082" "cg26948066"
## [113] "cg07523188" "cg26474732" "cg11133939" "cg02225060" "cg12279734" "cg10240127" "cg23432430"
## [120] "cg16652920" "cg06112204" "cg12228670" "cg19503462" "cg07028768" "cg14240646" "cg09584650"
## [127] "cg27272246" "cg16749614" "cg04664583" "cg26757229" "cg03982462" "cg06715136" "cg15501526"
## [134] "cg04248279" "cg01680303" "cg06536614" "cg26219488" "cg18819889" "cg05570109" "cg02981548"
## [141] "cg08861434" "cg00689685" "cg17429539" "cg00322003" "cg11247378" "cg07152869" "cg00154902"
## [148] "cg14527649" "cg27452255" "cg03129555" "cg06697310" "cg20507276" "cg27577781" "cg20685672"
## [155] "cg03660162" "DX"
print("Number of Features :")
## [1] "Number of Features :"
Num_feaForProcess = length(AfterProcess_FeatureName)-1 # exclude the "DX" label
print(Num_feaForProcess) 
## [1] 155

2. Logistic Regression Model

2.1 Logistic Regression Model Training

df_LRM1<-processed_data 
featureName_LRM1<-AfterProcess_FeatureName
library(glmnet)
library(caret)

set.seed(123)  # for reproducibility
trainIndex <- createDataPartition(df_LRM1$DX, p = 0.7, list = FALSE)
trainData <- df_LRM1[trainIndex, ]
testData <- df_LRM1[-trainIndex, ]
dim(trainData)
## [1] 455 156
dim(testData)
## [1] 193 156
ctrl <- trainControl(method = "cv", number = 5)

model_LRM1 <- caret::train(DX ~ ., data = trainData, method = "glmnet", trControl = ctrl)

predictions <- predict(model_LRM1, newdata = testData,type="raw")
cm_modelTrain_LRM1 <- caret::confusionMatrix(predictions, testData$DX)

print(cm_modelTrain_LRM1)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia MCI
##   CN       46        7  14
##   Dementia  3       10   4
##   MCI      17       11  81
## 
## Overall Statistics
##                                           
##                Accuracy : 0.7098          
##                  95% CI : (0.6403, 0.7728)
##     No Information Rate : 0.513           
##     P-Value [Acc > NIR] : 2.018e-08       
##                                           
##                   Kappa : 0.4987          
##                                           
##  Mcnemar's Test P-Value : 0.1607          
## 
## Statistics by Class:
## 
##                      Class: CN Class: Dementia Class: MCI
## Sensitivity             0.6970         0.35714     0.8182
## Specificity             0.8346         0.95758     0.7021
## Pos Pred Value          0.6866         0.58824     0.7431
## Neg Pred Value          0.8413         0.89773     0.7857
## Prevalence              0.3420         0.14508     0.5130
## Detection Rate          0.2383         0.05181     0.4197
## Detection Prevalence    0.3472         0.08808     0.5648
## Balanced Accuracy       0.7658         0.65736     0.7602
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
cm_modelTrain_LRM1_Accuracy<-cm_modelTrain_LRM1$overall["Accuracy"]
cm_modelTrain_LRM1_Kappa<-cm_modelTrain_LRM1$overall["Kappa"]
print(cm_modelTrain_LRM1_Accuracy)
##  Accuracy 
## 0.7098446
print(cm_modelTrain_LRM1_Kappa)
##     Kappa 
## 0.4987013
print(model_LRM1)
## glmnet 
## 
## 455 samples
## 155 predictors
##   3 classes: 'CN', 'Dementia', 'MCI' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 364, 365, 363, 364, 364 
## Resampling results across tuning parameters:
## 
##   alpha  lambda        Accuracy   Kappa    
##   0.10   0.0001810831  0.6350263  0.3962356
##   0.10   0.0018108309  0.6482375  0.4142573
##   0.10   0.0181083090  0.6548792  0.4144240
##   0.55   0.0001810831  0.6263550  0.3765308
##   0.55   0.0018108309  0.6505792  0.4121576
##   0.55   0.0181083090  0.6461597  0.3827291
##   1.00   0.0001810831  0.6087233  0.3485056
##   1.00   0.0018108309  0.6417152  0.3949776
##   1.00   0.0181083090  0.5867925  0.2663062
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.01810831.
train_predictions <- predict(model_LRM1, newdata = trainData, type = "raw")


train_accuracy <- mean(train_predictions == trainData$DX)


print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  0.96043956043956"
modelTrain_LRM1_trainAccuracy<-train_accuracy

print(modelTrain_LRM1_trainAccuracy)
## [1] 0.9604396
mean_accuracy_model_LRM1 <- mean(model_LRM1$results$Accuracy)
modelTrain_mean_accuracy_cv_LRM1 <- mean_accuracy_model_LRM1
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(modelTrain_mean_accuracy_cv_LRM1)
## [1] 0.6331631
library(caret)
library(pROC)
if (METHOD_FEATURE_FLAG ==5){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
 
  roc_curve <- roc(testData$DX,
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)

  print("The auc value is:")
  print(auc_value)
  modelTrain_LRM1_AUC <- auc_value


  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==4 || METHOD_FEATURE_FLAG==6 ){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
 
  roc_curve <- roc(testData$DX,
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)

  print("The auc value is:")
  print(auc_value)
  modelTrain_LRM1_AUC <- auc_value


  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==3){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
 
  roc_curve <- roc(testData$DX,
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)

  print("The auc value is:")
  print(auc_value)
  modelTrain_LRM1_AUC <- auc_value


  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==1){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.8487
## The AUC value for class CN is: 0.8487235 
## 
## Class: Dementia 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.831
## The AUC value for class Dementia is: 0.8309524 
## 
## Class: MCI 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.8189
## The AUC value for class MCI is: 0.818934

if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    modelTrain_LRM1_AUC <- mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.83287
print(modelTrain_LRM1_AUC)
## [1] 0.83287
importance_model_LRM1 <- varImp(model_LRM1)

print(importance_model_LRM1)
## glmnet variable importance
## 
##   variables are sorted by maximum importance across the classes
##   only 20 most important variables shown (out of 155)
## 
##                CN Dementia    MCI
## PC1        90.434  100.000  0.000
## PC2        46.684   78.708  0.000
## PC3         6.233    0.000 68.068
## cg00962106 63.055   11.820 36.946
## cg02225060 23.026   12.636 51.157
## cg14710850 49.611    8.376 25.404
## cg27452255 49.043   17.847 11.831
## cg02981548 26.242    5.624 49.019
## cg08861434 48.632    0.000 42.794
## cg19503462 25.906   48.114  5.790
## cg07152869 27.976   46.742  1.373
## cg16749614 11.544   17.975 45.949
## cg05096415  1.401   44.886 28.916
## cg23432430 44.232    3.492 25.272
## cg17186592  3.086   41.991 26.679
## cg00247094 15.870   41.652 10.423
## cg09584650 41.417    6.520 18.544
## cg11133939 24.223    0.000 40.464
## cg16715186 39.191    7.687 17.054
## cg03129555 12.434   38.574  8.410
plot(importance_model_LRM1, top = 20, main = "Variable Importance Plot")

importance_model_LRM1_df<-importance_model_LRM1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 ||METHOD_FEATURE_FLAG==6 ){
  
importance_final_model_LRM1 <- varImp(model_LRM1$finalModel)

library(dplyr)

ordered_importance_final_model_LRM1 <- importance_final_model_LRM1 %>% arrange(desc(Overall))

print(ordered_importance_final_model_LRM1)  
  
}
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_model_LRM1_df$Feature<-rownames(importance_model_LRM1_df)
  importance_model_LRM1_df <- importance_model_LRM1_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_model_LRM1_df)
  
}
##             CN     Dementia         MCI    Feature MaxImportance
## 1   90.4344829 1.000000e+02  0.00000000        PC1   100.0000000
## 2   46.6839222 7.870805e+01  0.00000000        PC2    78.7080496
## 3    6.2328162 0.000000e+00 68.06814488        PC3    68.0681449
## 4   63.0552396 1.182025e+01 36.94618316 cg00962106    63.0552396
## 5   23.0256423 1.263604e+01 51.15668910 cg02225060    51.1566891
## 6   49.6114597 8.375718e+00 25.40447309 cg14710850    49.6114597
## 7   49.0426664 1.784727e+01 11.83094250 cg27452255    49.0426664
## 8   26.2421391 5.623593e+00 49.01884510 cg02981548    49.0188451
## 9   48.6316741 0.000000e+00 42.79414741 cg08861434    48.6316741
## 10  25.9056334 4.811382e+01  5.79002920 cg19503462    48.1138238
## 11  27.9758274 4.674230e+01  1.37255301 cg07152869    46.7422954
## 12  11.5439720 1.797550e+01 45.94910047 cg16749614    45.9491005
## 13   1.4007011 4.488618e+01 28.91575860 cg05096415    44.8861773
## 14  44.2318100 3.492439e+00 25.27236685 cg23432430    44.2318100
## 15   3.0857895 4.199093e+01 26.67887307 cg17186592    41.9909350
## 16  15.8701456 4.165152e+01 10.42296023 cg00247094    41.6515174
## 17  41.4170473 6.519659e+00 18.54359467 cg09584650    41.4170473
## 18  24.2229738 0.000000e+00 40.46366699 cg11133939    40.4636670
## 19  39.1906814 7.687132e+00 17.05368074 cg16715186    39.1906814
## 20  12.4335216 3.857431e+01  8.41047510 cg03129555    38.5743135
## 21   3.1975305 2.009482e+01 38.48178203 cg08857872    38.4817820
## 22  12.1267569 3.681366e+01 11.11575266 cg06864789    36.8136560
## 23   0.0000000 3.526450e+01 26.75307191 cg14924512    35.2645029
## 24   7.2034302 1.187744e+01 34.91240716 cg16652920    34.9124072
## 25  19.0892781 3.462629e+01  0.00000000 cg03084184    34.6262858
## 26   3.6595748 1.337504e+01 34.17533403 cg26219488    34.1753340
## 27  13.4696898 3.379055e+01  6.05974339 cg20913114    33.7905520
## 28   7.1339389 3.347220e+01 11.82338431 cg06378561    33.4722024
## 29  33.3316253 1.550029e+01  2.09187016 cg26948066    33.3316253
## 30   0.5622498 3.328357e+01 17.47528280 cg25259265    33.2835690
## 31  33.2211736 0.000000e+00 21.60293744 cg06536614    33.2211736
## 32   1.6519657 3.232403e+01 17.24679048 cg24859648    32.3240298
## 33  12.7595657 3.077640e+01  2.20542082 cg12279734    30.7764036
## 34  30.6984661 1.115898e+01  2.50462012 cg03982462    30.6984661
## 35   1.2155507 3.061490e+01 16.61316072 cg05841700    30.6149031
## 36  29.8250204 7.639480e+00  7.72771336 cg11227702    29.8250204
## 37  25.3395122 0.000000e+00 29.02900714 cg12146221    29.0290071
## 38   9.6443024 8.947027e+00 28.92927076 cg02621446    28.9292708
## 39   0.0000000 2.258056e+01 28.83675421 cg00616572    28.8367542
## 40  28.4405639 8.992560e+00  6.53191286 cg15535896    28.4405639
## 41  25.4461478 0.000000e+00 28.22422237 cg02372404    28.2242224
## 42   5.0485282 2.776085e+01  8.13030308 cg09854620    27.7608463
## 43  27.6118761 0.000000e+00 15.84264987 cg04248279    27.6118761
## 44   4.0112641 7.691476e+00 27.53732819 cg20678988    27.5373282
## 45   0.0000000 2.751128e+01 13.85128088 cg24861747    27.5112819
## 46  27.4916653 1.565730e+01  0.00000000 cg10240127    27.4916653
## 47   7.7673556 7.237543e+00 27.22040905 cg16771215    27.2204090
## 48   0.6494625 2.697143e+01 14.64691461 cg01667144    26.9714309
## 49  26.9340775 8.940726e+00  2.80991093 cg13080267    26.9340775
## 50   0.0000000 2.615117e+01 26.58162739 cg02494911    26.5816274
## 51   9.3828836 2.645607e+01  5.12487040 cg10750306    26.4560652
## 52  25.4477953 1.210178e+00 11.25303029 cg11438323    25.4477953
## 53   4.8678332 4.041647e+00 25.42751373 cg06715136    25.4275137
## 54  25.1189917 0.000000e+00 15.38030487 cg04412904    25.1189917
## 55   4.7675523 2.483637e+01  5.38989554 cg12738248    24.8363747
## 56  24.4204434 0.000000e+00 18.67541535 cg03071582    24.4204434
## 57   0.0000000 2.428213e+01 15.80040203 cg05570109    24.2821316
## 58  24.2207010 2.027864e+01  0.00000000 cg15775217    24.2207010
## 59   0.0000000 1.993091e+01 24.20455571 cg24873924    24.2045557
## 60   7.5573086 4.150281e+00 24.12319507 cg17738613    24.1231951
## 61  23.8473094 0.000000e+00 20.77999536 cg01921484    23.8473094
## 62   0.0000000 1.632489e+01 23.69841380 cg10369879    23.6984138
## 63   0.0000000 1.842186e+01 23.64630853 cg27341708    23.6463085
## 64   0.0000000 2.356446e+01 21.42456671 cg12534577    23.5644638
## 65   0.0000000 2.340991e+01 17.84326042 cg18821122    23.4099060
## 66   4.6199090 6.918591e+00 23.35218072 cg12682323    23.3521807
## 67  23.3278818 0.000000e+00 14.17259995 cg05234269    23.3278818
## 68  23.0176353 0.000000e+00 22.81015000 cg20685672    23.0176353
## 69  20.3601152 0.000000e+00 22.85377420 cg12228670    22.8537742
## 70  22.7096273 3.661922e+00  8.33669238 cg11331837    22.7096273
## 71   0.0000000 2.268978e+01 20.85857948 cg01680303    22.6897830
## 72  22.4120115 1.162854e+00 10.22545813 cg17421046    22.4120115
## 73  22.2804698 8.055739e+00  2.25662360 cg03088219    22.2804698
## 74  22.2513002 1.529470e+01  0.00000000 cg02356645    22.2513002
## 75  22.2504181 1.930716e+01  0.00000000 cg00322003    22.2504181
## 76   5.9019825 2.209156e+01  1.27018775 cg01013522    22.0915627
## 77  12.6358759 0.000000e+00 21.79598580 cg00272795    21.7959858
## 78  21.6466651 0.000000e+00 14.52658200 cg25758034    21.6466651
## 79   4.7731408 2.161796e+01  1.17837356 cg26474732    21.6179639
## 80   0.0000000 2.127023e+01 17.62935554 cg16579946    21.2702334
## 81   9.6112523 2.119858e+01  0.00000000 cg07523188    21.1985800
## 82  21.1973751 4.527337e+00  5.64210393 cg11187460    21.1973751
## 83   0.0000000 1.704269e+01 20.81174411 cg14527649    20.8117441
## 84   2.7202966 4.869331e+00 20.54066769 cg20370184    20.5406677
## 85  20.5303029 0.000000e+00 13.70608410 cg17429539    20.5303029
## 86   0.0000000 2.028240e+01 10.01345098 cg20507276    20.2824035
## 87   1.1867078 6.814770e+00 20.19281595 cg13885788    20.1928160
## 88   0.0000000 1.558541e+01 20.05749322 cg16178271    20.0574932
## 89   5.5930964 1.529211e+00 19.98634450 cg10738648    19.9863445
## 90   5.1548765 1.992074e+01  2.74954989 cg26069044    19.9207402
## 91   3.2054503 4.951311e+00 19.79635640 cg25879395    19.7963564
## 92  19.6439491 0.000000e+00 12.11785893 cg06112204    19.6439491
## 93   3.2324983 1.921062e+01  1.24898511 cg23161429    19.2106229
## 94  19.0283759 0.000000e+00  8.87014717 cg25436480    19.0283759
## 95  18.8728946 1.899214e+01  0.00000000 cg26757229    18.9921399
## 96  18.8606945 8.141903e+00  0.00000000 cg02932958    18.8606945
## 97   6.3337445 1.861804e+01  0.95397288 cg18339359    18.6180445
## 98  12.0316035 1.860484e+01  0.00000000 cg23916408    18.6048429
## 99  18.5744389 1.503781e+00  1.89144629 cg06950937    18.5744389
## 100  1.5227772 3.199074e+00 18.17528976 cg12784167    18.1752898
## 101 11.9132835 0.000000e+00 18.10151143 cg07480176    18.1015114
## 102  0.0000000 5.506519e+00 17.68798524 cg15865722    17.6879852
## 103 17.6817848 0.000000e+00 13.03892473 cg27577781    17.6817848
## 104 17.1550957 2.938004e+00  2.52930380 cg05321907    17.1550957
## 105 16.8401590 0.000000e+00  7.60307246 cg03660162    16.8401590
## 106 16.7411474 0.000000e+00  9.90396946 cg07138269    16.7411474
## 107 16.7140083 2.389325e-04  5.47626965 cg20139683    16.7140083
## 108  1.5162248 1.661662e+01  3.60056402 cg12284872    16.6166236
## 109 16.5242600 0.000000e+00 15.33313269 cg03327352    16.5242600
## 110  0.0000000 1.651025e+01 12.91106534 cg23658987    16.5102468
## 111  0.0000000 1.476695e+01 16.16221407 cg21854924    16.1622141
## 112 15.7813302 0.000000e+00  6.84076573 cg21697769    15.7813302
## 113 15.6779586 5.756032e+00  0.00000000 cg19512141    15.6779586
## 114 10.3116466 0.000000e+00 15.47815088 cg08198851    15.4781509
## 115  0.4280786 1.509018e+01  0.82840359 cg00675157    15.0901807
## 116  0.0000000 5.719351e+00 15.01578851 cg01153376    15.0157885
## 117  1.8016318 1.495028e+01  0.76033114 cg01933473    14.9502833
## 118 14.8786805 0.000000e+00  4.61118553 cg12776173    14.8786805
## 119  0.0000000 1.065936e+01 14.72386008 cg14564293    14.7238601
## 120 12.4166349 0.000000e+00 14.55809383 cg24851651    14.5580938
## 121  0.0000000 1.452280e+01  2.27124301 cg22274273    14.5228011
## 122 12.7993364 1.450845e+01  0.00000000 cg25561557    14.5084539
## 123 13.7813968 1.440283e+01  0.00000000 cg21209485    14.4028332
## 124  3.9030612 1.429224e+01  0.00000000 cg10985055    14.2922449
## 125  8.0847376 0.000000e+00 14.23403693 cg14293999    14.2340369
## 126  0.0000000 6.070016e+00 13.99431366 cg18819889    13.9943137
## 127  7.9364691 1.388963e+01  0.00000000 cg24506579    13.8896326
## 128 10.4693081 0.000000e+00 13.82844499 cg19377607    13.8284450
## 129  2.6082790 1.361679e+01  0.00000000 cg06697310    13.6167854
## 130 13.5523883 0.000000e+00 10.18252032 cg00696044    13.5523883
## 131  0.0000000 0.000000e+00 13.07862910 cg01549082    13.0786291
## 132  0.0000000 6.883520e+00 13.07489084 cg01128042    13.0748908
## 133  0.2735344 1.248508e+01  1.15822265 cg00999469    12.4850825
## 134  0.0000000 1.077830e+01 12.38511672 cg06118351    12.3851167
## 135  0.0000000 1.124520e+01 11.81413047 cg12012426    11.8141305
## 136 11.7410716 9.447329e+00  0.00000000 cg08584917    11.7410716
## 137 11.6834885 0.000000e+00 11.18198961 cg27272246    11.6834885
## 138  0.0000000 1.167004e+01  2.24079957 cg15633912    11.6700418
## 139  1.2020056 1.135081e+01  0.00000000 cg16788319    11.3508120
## 140 11.3467673 1.967722e+00  0.00000000 cg17906851    11.3467673
## 141  8.9763770 0.000000e+00 11.28260165 cg07028768    11.2826016
## 142  0.0000000 3.122665e+00 10.73693584 cg27086157    10.7369358
## 143  1.8134330 9.597518e+00  0.00000000 cg14240646     9.5975184
## 144  0.0000000 9.458714e+00  9.18234012 cg00154902     9.4587141
## 145  6.6678268 0.000000e+00  9.09857595 cg14307563     9.0985759
## 146  0.0000000 8.507666e+00  0.00000000 cg02320265     8.5076657
## 147  8.1996351 0.000000e+00  7.04889093 cg08779649     8.1996351
## 148  7.6525336 0.000000e+00  7.97801696 cg04664583     7.9780170
## 149  0.0000000 0.000000e+00  6.63057437 cg12466610     6.6305744
## 150  6.2606420 3.710632e+00  0.00000000 cg27639199     6.2606420
## 151  0.0000000 0.000000e+00  5.83487998 cg15501526     5.8348800
## 152  0.0000000 4.828481e+00  3.67138763 cg00689685     4.8284809
## 153  2.7832004 0.000000e+00  0.08909863 cg01413796     2.7832004
## 154  0.0000000 0.000000e+00  2.11969854 cg11247378     2.1196985
## 155  0.5214480 0.000000e+00  0.63591611    age.now     0.6359161
if (!require(reshape2)) {
  install.packages("reshape2")
  library(reshape2)
} else {
  library(reshape2)
}

if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_LRM1_df <- importance_model_LRM1_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.

if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_model_LRM1_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_model_LRM1_df,n=20)$Feature)
  importance_melted_LRM1_df <- importance_model_LRM1_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
##           CN   Dementia       MCI    Feature MaxImportance
## 1  90.434483 100.000000  0.000000        PC1     100.00000
## 2  46.683922  78.708050  0.000000        PC2      78.70805
## 3   6.232816   0.000000 68.068145        PC3      68.06814
## 4  63.055240  11.820255 36.946183 cg00962106      63.05524
## 5  23.025642  12.636043 51.156689 cg02225060      51.15669
## 6  49.611460   8.375718 25.404473 cg14710850      49.61146
## 7  49.042666  17.847272 11.830943 cg27452255      49.04267
## 8  26.242139   5.623593 49.018845 cg02981548      49.01885
## 9  48.631674   0.000000 42.794147 cg08861434      48.63167
## 10 25.905633  48.113824  5.790029 cg19503462      48.11382
## 11 27.975827  46.742295  1.372553 cg07152869      46.74230
## 12 11.543972  17.975498 45.949100 cg16749614      45.94910
## 13  1.400701  44.886177 28.915759 cg05096415      44.88618
## 14 44.231810   3.492439 25.272367 cg23432430      44.23181
## 15  3.085789  41.990935 26.678873 cg17186592      41.99093
## 16 15.870146  41.651517 10.422960 cg00247094      41.65152
## 17 41.417047   6.519659 18.543595 cg09584650      41.41705
## 18 24.222974   0.000000 40.463667 cg11133939      40.46367
## 19 39.190681   7.687132 17.053681 cg16715186      39.19068
## 20 12.433522  38.574313  8.410475 cg03129555      38.57431
## [1] "the top 20 features based on max way:"
##  [1] "PC1"        "PC2"        "PC3"        "cg00962106" "cg02225060" "cg14710850" "cg27452255"
##  [8] "cg02981548" "cg08861434" "cg19503462" "cg07152869" "cg16749614" "cg05096415" "cg23432430"
## [15] "cg17186592" "cg00247094" "cg09584650" "cg11133939" "cg16715186" "cg03129555"
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.

2.2 Model Diagnose & Improve

2.2.1 Class imbalance

Class imbalance Check

  • Let’s plot the distribution of “DX” using a bar plot.
table(df_LRM1$DX)
## 
##       CN Dementia      MCI 
##      221       94      333
prop.table(table(df_LRM1$DX))
## 
##        CN  Dementia       MCI 
## 0.3410494 0.1450617 0.5138889
table(trainData$DX)
## 
##       CN Dementia      MCI 
##      155       66      234
prop.table(table(trainData$DX))
## 
##        CN  Dementia       MCI 
## 0.3406593 0.1450549 0.5142857
barplot(table(df_LRM1$DX), main = "Whole Data Class Distribution")

For the training Data set:

barplot(table(trainData$DX), main = "Train Data Class Distribution")

  • Let’s calculate the imbalance ratio, which is the ratio of the number of samples in the majority class to the number of samples in the minority class. severe class imbalance will be indicated by high ratio.

    class_counts <- table(df_LRM1$DX)
    imbalance_ratio <- max(class_counts) / min(class_counts)
    print("The imbalance radio of the whole data set is:")
    ## [1] "The imbalance radio of the whole data set is:"
    print(imbalance_ratio)
    ## [1] 3.542553
    class_counts <- table(trainData$DX)
    imbalance_ratio <- max(class_counts) / min(class_counts)
    print("The imbalance radio of the training data set is:")
    ## [1] "The imbalance radio of the training data set is:"
    print(imbalance_ratio)
    ## [1] 3.545455
  • Let’s do Chi-square test which could determine if the class distribution significantly deviates from a balanced distribution. The p-value provided by the test will indicate the significance of class imbalance.

    chisq.test(table(df_LRM1$DX))
    ## 
    ##  Chi-squared test for given probabilities
    ## 
    ## data:  table(df_LRM1$DX)
    ## X-squared = 132.4, df = 2, p-value < 2.2e-16
    chisq.test(table(trainData$DX))
    ## 
    ##  Chi-squared test for given probabilities
    ## 
    ## data:  table(trainData$DX)
    ## X-squared = 93.156, df = 2, p-value < 2.2e-16

Solve Class imbalance use “SMOTE” (NOT OK YET, MAY NEED FURTHER IMPROVE)

library(smotefamily)

smote_data_LGR_1 <- SMOTE(X = trainData[, !names(trainData) %in% "DX"], target = trainData$DX, K = 5, dup_size = 1)


balanced_data_LGR_1 <- smote_data_LGR_1$data
colnames(balanced_data_LGR_1)[colnames(balanced_data_LGR_1) == "class"] <- "DX"
table(balanced_data_LGR_1$DX)
## 
##       CN Dementia      MCI 
##      155      132      234
dim(balanced_data_LGR_1)
## [1] 521 156

Fit Model with Balanced Data

ctrl <- trainControl(method = "cv", number = 5)


model_LRM2 <- caret::train(DX ~ ., data = balanced_data_LGR_1, method = "glmnet", trControl = ctrl)


predictions <- predict(model_LRM2, newdata = testData)
cm_modelTrain_LRM2<-caret::confusionMatrix(predictions, testData$DX)
print(cm_modelTrain_LRM2)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia MCI
##   CN       45        6  15
##   Dementia  4       11   6
##   MCI      17       11  78
## 
## Overall Statistics
##                                           
##                Accuracy : 0.6943          
##                  95% CI : (0.6241, 0.7584)
##     No Information Rate : 0.513           
##     P-Value [Acc > NIR] : 2.356e-07       
##                                           
##                   Kappa : 0.4779          
##                                           
##  Mcnemar's Test P-Value : 0.5733          
## 
## Statistics by Class:
## 
##                      Class: CN Class: Dementia Class: MCI
## Sensitivity             0.6818         0.39286     0.7879
## Specificity             0.8346         0.93939     0.7021
## Pos Pred Value          0.6818         0.52381     0.7358
## Neg Pred Value          0.8346         0.90116     0.7586
## Prevalence              0.3420         0.14508     0.5130
## Detection Rate          0.2332         0.05699     0.4041
## Detection Prevalence    0.3420         0.10881     0.5492
## Balanced Accuracy       0.7582         0.66613     0.7450
cm_modelTrain_LRM2_Accuracy<-cm_modelTrain_LRM2$overall["Accuracy"]
cm_modelTrain_LRM2_Kappa<-cm_modelTrain_LRM2$overall["Kappa"]
print(cm_modelTrain_LRM2_Accuracy)
##  Accuracy 
## 0.6943005
print(cm_modelTrain_LRM2_Kappa)
##    Kappa 
## 0.477924
print(model_LRM2)
## glmnet 
## 
## 521 samples
## 155 predictors
##   3 classes: 'CN', 'Dementia', 'MCI' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 416, 417, 417, 417, 417 
## Resampling results across tuning parameters:
## 
##   alpha  lambda       Accuracy   Kappa    
##   0.10   0.000186946  0.7064835  0.5493874
##   0.10   0.001869460  0.7121978  0.5563269
##   0.10   0.018694597  0.7180220  0.5649649
##   0.55   0.000186946  0.7007143  0.5401066
##   0.55   0.001869460  0.7102930  0.5525186
##   0.55   0.018694597  0.6872894  0.5142517
##   1.00   0.000186946  0.6815018  0.5106741
##   1.00   0.001869460  0.7006777  0.5383593
##   1.00   0.018694597  0.6468864  0.4489232
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.0186946.
train_predictions <- predict(model_LRM2, newdata = trainData, type = "raw")


train_accuracy <- mean(train_predictions == trainData$DX)

modelTrain_LRM2_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", modelTrain_LRM2_trainAccuracy))
## [1] "Training Accuracy:  0.958241758241758"
mean_accuracy_model_LRM2 <- mean(model_LRM2$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM2)
## [1] 0.6960073
modelTrain_LRM2_mean_accuracy_model_LRM2 <- mean_accuracy_model_LRM2
print(modelTrain_LRM2_mean_accuracy_model_LRM2)
## [1] 0.6960073
importance_model_LRM2 <- varImp(model_LRM2)

print(importance_model_LRM2)
## glmnet variable importance
## 
##   variables are sorted by maximum importance across the classes
##   only 20 most important variables shown (out of 155)
## 
##                CN Dementia    MCI
## PC1        80.704  100.000  0.000
## PC2        38.892   80.705  0.000
## cg00962106 56.201    9.090 33.503
## PC3         7.689    0.000 55.751
## cg19503462 26.316   48.654  6.550
## cg27452255 47.897   21.174  8.087
## cg07152869 27.965   46.007  1.318
## cg02225060 18.278   12.778 45.594
## cg05096415  3.335   45.575 28.308
## cg14710850 45.318    8.637 21.709
## cg02981548 23.101    5.918 45.304
## cg08861434 44.834    0.000 36.637
## cg03129555 14.445   41.997 10.550
## cg23432430 41.986    6.868 20.302
## cg16749614  8.925   17.017 41.743
## cg17186592  3.594   40.128 25.162
## cg14924512  1.844   38.968 23.218
## cg09584650 38.239    7.574 15.083
## cg06864789 13.551   38.069 11.887
## cg03084184 19.827   37.852  3.069
plot(importance_model_LRM2, top = 20, main = "Variable Importance Plot")

importance_model_LRM2_df<-importance_model_LRM2$importance
if(METHOD_FEATURE_FLAG==3||METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG ==5 || METHOD_FEATURE_FLAG == 6){
importance_final_model_LRM2 <- varImp(model_LRM2$finalModel)
library(dplyr)
ordered_importance_final_model_LRM2 <- importance_final_model_LRM2 %>% arrange(desc(Overall))
print(ordered_importance_final_model_LRM2)  
  
}
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_model_LRM2_df$Feature<-rownames(importance_model_LRM2_df)
  importance_model_LRM2_df <- importance_model_LRM2_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_model_LRM2_df)
  
}
##               CN     Dementia          MCI    Feature MaxImportance
## 1   80.703686141 100.00000000  0.000000000        PC1   100.0000000
## 2   38.892106774  80.70482771  0.000000000        PC2    80.7048277
## 3   56.201479700   9.08972707 33.503421018 cg00962106    56.2014797
## 4    7.688739102   0.00000000 55.751183885        PC3    55.7511839
## 5   26.316001409  48.65418043  6.550251667 cg19503462    48.6541804
## 6   47.896588361  21.17415643  8.086550372 cg27452255    47.8965884
## 7   27.965098019  46.00690924  1.318166875 cg07152869    46.0069092
## 8   18.278300515  12.77780684 45.593955040 cg02225060    45.5939550
## 9    3.334662043  45.57501817 28.308064223 cg05096415    45.5750182
## 10  45.317553155   8.63704057 21.709092237 cg14710850    45.3175532
## 11  23.101263030   5.91845809 45.303881158 cg02981548    45.3038812
## 12  44.833547439   0.00000000 36.636739164 cg08861434    44.8335474
## 13  14.444863358  41.99728281 10.550232984 cg03129555    41.9972828
## 14  41.986435467   6.86821373 20.302170331 cg23432430    41.9864355
## 15   8.925154158  17.01732091 41.743083297 cg16749614    41.7430833
## 16   3.594336479  40.12756008 25.162036726 cg17186592    40.1275601
## 17   1.844403961  38.96836649 23.218245781 cg14924512    38.9683665
## 18  38.238819997   7.57425527 15.082827531 cg09584650    38.2388200
## 19  13.550836792  38.06891722 11.886579897 cg06864789    38.0689172
## 20  19.826782299  37.85235354  3.068918378 cg03084184    37.8523535
## 21  21.494484511   0.51248954 37.519314166 cg11133939    37.5193142
## 22  13.590537474  37.17398397  9.106089935 cg00247094    37.1739840
## 23   0.546760644  20.67374130 35.720839473 cg08857872    35.7208395
## 24  35.478778383   7.95324361 14.045617414 cg16715186    35.4787784
## 25   4.944695358  35.05438841 17.439917764 cg24859648    35.0543884
## 26  14.088834696  34.55990196  5.441433884 cg12279734    34.5599020
## 27   1.721991260  34.09480960 18.443604362 cg25259265    34.0948096
## 28   8.420908584  34.06039806 11.644562269 cg06378561    34.0603981
## 29   2.318696798  13.36834282 31.989893449 cg26219488    31.9898934
## 30  12.462516572  31.58502611  5.781491250 cg20913114    31.5850261
## 31   5.481129412  11.23741551 31.361194654 cg16652920    31.3611947
## 32   1.409335240  30.97517179 17.381030655 cg05841700    30.9751718
## 33  29.684237819  14.08949397  0.800327955 cg26948066    29.6842378
## 34  28.741079800  12.28173949  0.038600884 cg03982462    28.7410798
## 35  28.246236512   8.07867524  6.652009298 cg11227702    28.2462365
## 36   6.451683537  28.03614623  8.129228528 cg09854620    28.0361462
## 37  27.457851489   0.00000000 21.578341138 cg06536614    27.4578515
## 38   7.548981775   9.68673919 27.091557976 cg02621446    27.0915580
## 39   0.000000000  26.98140952 24.133953778 cg02494911    26.9814095
## 40  20.434384880   0.00000000 26.640690302 cg12146221    26.6406903
## 41   0.000000000  25.78989084 26.596408896 cg00616572    26.5964089
## 42   9.535314131  26.42283712  5.646486425 cg10750306    26.4228371
## 43  26.171246498   7.89239684  6.028970680 cg15535896    26.1712465
## 44   1.141241069  25.92805819 13.642098297 cg01667144    25.9280582
## 45   0.000000000  25.62258868 13.483020973 cg24861747    25.6225887
## 46  25.562890663  15.08442524  0.000000000 cg10240127    25.5628907
## 47  24.107334555   0.00000000 25.129944649 cg02372404    25.1299446
## 48   1.099920364   8.20445533 25.057279275 cg06715136    25.0572793
## 49  24.802706624   0.00000000 16.164643453 cg20685672    24.8027066
## 50   0.000000000  24.74797600 14.637492919 cg05570109    24.7479760
## 51  24.747252306   0.00000000 13.430078586 cg04248279    24.7472523
## 52   4.039949965   5.50360620 24.334481230 cg20678988    24.3344812
## 53   0.000000000  24.20182274 18.412414046 cg12534577    24.2018227
## 54   0.000000000  24.14528640 15.852423949 cg16579946    24.1452864
## 55   4.824103433  24.11211052  5.706278058 cg12738248    24.1121105
## 56   6.529552881   5.93633077 24.070923901 cg16771215    24.0709239
## 57  23.998697732  10.16375705  0.028444076 cg13080267    23.9986977
## 58   5.508573597   5.66470926 23.060145881 cg17738613    23.0601459
## 59  22.320691887   6.53204639  5.663980561 cg11331837    22.3206919
## 60   0.000000000  22.29255346 17.220491420 cg01680303    22.2925535
## 61  22.203937989   0.00000000 13.212696497 cg04412904    22.2039380
## 62   0.000000000  22.06075522 14.962802230 cg18821122    22.0607552
## 63   3.426205425   7.31616766 22.054014997 cg12682323    22.0540150
## 64  22.044090381  16.25405024  0.000000000 cg02356645    22.0440904
## 65   0.000000000  20.81120723 22.035886216 cg24873924    22.0358862
## 66   0.000000000  15.83806198 22.017151253 cg10369879    22.0171513
## 67   6.483591345  21.73850252  0.949117939 cg01013522    21.7385025
## 68  16.484539947   0.00000000 21.585055537 cg12228670    21.5850555
## 69   7.512558194  21.11431943  0.000000000 cg07523188    21.1143194
## 70  21.107468333  18.07754102  0.000000000 cg15775217    21.1074683
## 71  21.004224876   0.00000000 16.895333275 cg03071582    21.0042249
## 72  20.964839525   0.00000000 12.103259685 cg05234269    20.9648395
## 73   0.000000000  20.92290026  7.904303978 cg20507276    20.9229003
## 74   0.000000000  19.13703698 20.818072609 cg27341708    20.8180726
## 75  20.448742995   8.88181889  0.345152147 cg03088219    20.4487430
## 76  13.178540035  20.43950122  0.000000000 cg25561557    20.4395012
## 77  20.418843523   0.00000000 19.515999145 cg01921484    20.4188435
## 78   4.716200159  20.18026145  4.191000972 cg26069044    20.1802615
## 79  20.114237367   0.00000000  7.566075546 cg06112204    20.1142374
## 80  20.092712908   0.00000000 10.279542242 cg25758034    20.0927129
## 81  20.070487519   0.22767831  9.404986617 cg17421046    20.0704875
## 82  19.734694924   0.00000000  9.880887107 cg17429539    19.7346949
## 83  19.722311790   0.00000000 12.781522977 cg11438323    19.7223118
## 84  19.506576639  14.87078135  0.000000000 cg00322003    19.5065766
## 85  19.312640558   4.15131066  4.739257476 cg11187460    19.3126406
## 86   2.519503668   5.41476250 18.974767697 cg25879395    18.9747677
## 87   4.053920988  18.83733509  0.222787042 cg26474732    18.8373351
## 88   2.902849319  18.77753865  2.410773841 cg23161429    18.7775387
## 89   1.678228292   4.79451720 18.695449561 cg20370184    18.6954496
## 90  18.635058437   0.02030071  6.333516204 cg25436480    18.6350584
## 91   0.009367363   7.65726758 18.620269493 cg13885788    18.6202695
## 92  11.432827083  18.29052960  0.000000000 cg23916408    18.2905296
## 93   0.000000000  16.67394266 18.169205978 cg14527649    18.1692060
## 94   5.006391021   1.01514715 18.052832535 cg10738648    18.0528325
## 95   0.000000000  17.95587603 12.792821501 cg23658987    17.9558760
## 96   5.979693923  17.93087828  1.282292283 cg18339359    17.9308783
## 97  10.251947671   0.00000000 17.825772284 cg07480176    17.8257723
## 98  16.786642859  17.80306807  0.000000000 cg26757229    17.8030681
## 99   2.978375115  17.78146141  4.058113259 cg12284872    17.7814614
## 100 17.462538616   8.50325820  0.000000000 cg02932958    17.4625386
## 101  8.075452039  17.45484195  0.000000000 cg24506579    17.4548420
## 102 13.340238124   0.00000000 17.328520368 cg00272795    17.3285204
## 103  0.000000000   7.46135917 17.205305045 cg12784167    17.2053050
## 104 16.749558824   0.00000000  6.659641924 cg03660162    16.7495588
## 105  0.000000000  16.02418230 16.431224038 cg16178271    16.4312240
## 106 16.369728471   0.00000000 11.970813245 cg27577781    16.3697285
## 107 16.142509014   0.00000000  8.252708167 cg07138269    16.1425090
## 108 15.966964085   2.87389006  2.064971565 cg05321907    15.9669641
## 109  0.752128487  15.69090694  2.151492812 cg22274273    15.6909069
## 110  0.464584499   3.15826720 15.547431322 cg15865722    15.5474313
## 111 13.410477411  15.53876646  0.000000000 cg21209485    15.5387665
## 112 15.460415810   0.62945144  3.699987481 cg20139683    15.4604158
## 113  0.804687613  15.26727627  2.246868395 cg15633912    15.2672763
## 114  1.785059975  15.21041165  0.497890838 cg00675157    15.2104117
## 115  0.000000000  15.03562942 13.712510585 cg21854924    15.0356294
## 116  0.000000000   8.28447756 14.989389757 cg14564293    14.9893898
## 117  1.414757231  14.66617650  1.617046584 cg01933473    14.6661765
## 118 14.335634479   0.00000000  2.357215925 cg06950937    14.3356345
## 119  7.029045619   0.00000000 14.262088984 cg14293999    14.2620890
## 120  0.000000000   7.60256780 14.106906124 cg01128042    14.1069061
## 121 13.942362711   0.00000000  2.049705730 cg12776173    13.9423627
## 122 13.939989220   0.00000000 13.916599358 cg03327352    13.9399892
## 123  8.354050412   0.00000000 13.901700226 cg24851651    13.9017002
## 124  8.499466117   0.00000000 13.725880639 cg19377607    13.7258806
## 125 13.691076440   0.00000000  7.338783387 cg00696044    13.6910764
## 126  0.000000000   2.81944706 13.617942761 cg01153376    13.6179428
## 127 13.585923001   3.87624147  0.000000000 cg19512141    13.5859230
## 128  0.000000000   6.29261372 13.547936670 cg18819889    13.5479367
## 129  8.866699352   0.00000000 13.130459074 cg27272246    13.1304591
## 130 12.210246938   0.00000000 12.998907898 cg08198851    12.9989079
## 131  0.000000000   9.82359181 12.661949537 cg06118351    12.6619495
## 132  4.079204165  12.38089824  0.000000000 cg10985055    12.3808982
## 133  0.930325099  11.77391661  0.006453595 cg16788319    11.7739166
## 134  1.061914720  11.72164162  0.000000000 cg14240646    11.7216416
## 135  0.794060278  11.56506514  0.391257306 cg00999469    11.5650651
## 136  0.000000000  11.34570309 10.958707137 cg12012426    11.3457031
## 137  0.000000000   2.70078848 10.861921805 cg01549082    10.8619218
## 138 10.752888079   0.00000000  9.149308817 cg21697769    10.7528881
## 139 10.661004204   0.00000000  7.583647513 cg07028768    10.6610042
## 140 10.322517299   3.96305045  0.000000000 cg17906851    10.3225173
## 141  0.000000000   8.38569469  9.801351175 cg27086157     9.8013512
## 142  0.300234216   9.75810599  0.000000000 cg06697310     9.7581060
## 143  9.748681749   9.22022799  0.000000000 cg08584917     9.7486817
## 144  2.496354466   0.00000000  9.517519131 cg04664583     9.5175191
## 145  0.596325741   9.50789659  0.000000000 cg02320265     9.5078966
## 146  4.880785773   0.00000000  8.716856857 cg14307563     8.7168569
## 147  6.221593172   0.00000000  8.462523133 cg08779649     8.4625231
## 148  0.000000000   6.07441901  7.328898485 cg00154902     7.3288985
## 149  0.000000000   0.00000000  6.410132361 cg12466610     6.4101324
## 150  6.361107171   4.10109446  0.000000000 cg27639199     6.3611072
## 151  0.000000000   5.85370045  4.811164869 cg00689685     5.8537004
## 152  0.000000000   2.99739856  5.183362714 cg15501526     5.1833627
## 153  2.829967253   0.00000000  0.000000000 cg01413796     2.8299673
## 154  0.421123998   0.00000000  0.566168128    age.now     0.5661681
## 155  0.000000000   0.42855835  0.032750493 cg11247378     0.4285583
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_LRM2_df <- importance_model_LRM2_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM2_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.

if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_model_LRM2_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_model_LRM2_df,n=20)$Feature)
  
  importance_melted_LRM2_df <- importance_model_LRM2_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM2_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
##           CN   Dementia       MCI    Feature MaxImportance
## 1  80.703686 100.000000  0.000000        PC1     100.00000
## 2  38.892107  80.704828  0.000000        PC2      80.70483
## 3  56.201480   9.089727 33.503421 cg00962106      56.20148
## 4   7.688739   0.000000 55.751184        PC3      55.75118
## 5  26.316001  48.654180  6.550252 cg19503462      48.65418
## 6  47.896588  21.174156  8.086550 cg27452255      47.89659
## 7  27.965098  46.006909  1.318167 cg07152869      46.00691
## 8  18.278301  12.777807 45.593955 cg02225060      45.59396
## 9   3.334662  45.575018 28.308064 cg05096415      45.57502
## 10 45.317553   8.637041 21.709092 cg14710850      45.31755
## 11 23.101263   5.918458 45.303881 cg02981548      45.30388
## 12 44.833547   0.000000 36.636739 cg08861434      44.83355
## 13 14.444863  41.997283 10.550233 cg03129555      41.99728
## 14 41.986435   6.868214 20.302170 cg23432430      41.98644
## 15  8.925154  17.017321 41.743083 cg16749614      41.74308
## 16  3.594336  40.127560 25.162037 cg17186592      40.12756
## 17  1.844404  38.968366 23.218246 cg14924512      38.96837
## 18 38.238820   7.574255 15.082828 cg09584650      38.23882
## 19 13.550837  38.068917 11.886580 cg06864789      38.06892
## 20 19.826782  37.852354  3.068918 cg03084184      37.85235
## [1] "the top 20 features based on max way:"
##  [1] "PC1"        "PC2"        "cg00962106" "PC3"        "cg19503462" "cg27452255" "cg07152869"
##  [8] "cg02225060" "cg05096415" "cg14710850" "cg02981548" "cg08861434" "cg03129555" "cg23432430"
## [15] "cg16749614" "cg17186592" "cg14924512" "cg09584650" "cg06864789" "cg03084184"
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.

if(METHOD_FEATURE_FLAG == 5){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")

  roc_curve <- roc(testData$DX, prob_predictions[, "MCI"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  modelTrain_LRM2_AUC <-auc_value

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")

  roc_curve <- roc(testData$DX, prob_predictions[, "Dementia"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  modelTrain_LRM2_AUC <-auc_value

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")

  roc_curve <- roc(testData$DX, prob_predictions[, "CI"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  modelTrain_LRM2_AUC <-auc_value

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.8505
## The AUC value for class CN is: 0.850513 
## 
## Class: Dementia 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.8357
## The AUC value for class Dementia is: 0.8357143 
## 
## Class: MCI 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.8188
## The AUC value for class MCI is: 0.8188266

if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
}
## The mean AUC value across all classes with one versus rest method is: 0.835018
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    modelTrain_LRM2_AUC <-mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.835018
print(modelTrain_LRM2_AUC)
## [1] 0.835018

3. Elastic Net

3.1 Elastic Net Model Training

df_ENM1<-processed_data 
featureName_ENM1<-AfterProcess_FeatureName
library(caret)

set.seed(123)
trainIndex <- createDataPartition(df_ENM1$DX, p = 0.7, list = FALSE)
trainData_ENM1 <- df_ENM1[trainIndex, ]
testData_ENM1 <- df_ENM1[-trainIndex, ]
ctrl <- trainControl(method = "cv", number = 5)

param_grid <- expand.grid(alpha = 0:1, lambda = seq(0.001, 1, length = 20))

elastic_net_model1 <- caret::train(DX ~ ., data = trainData_ENM1, method = "glmnet",
                           trControl = ctrl, tuneGrid = param_grid)

print(elastic_net_model1)
## glmnet 
## 
## 455 samples
## 155 predictors
##   3 classes: 'CN', 'Dementia', 'MCI' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 364, 365, 363, 364, 364 
## Resampling results across tuning parameters:
## 
##   alpha  lambda      Accuracy   Kappa     
##   0      0.00100000  0.6571736  0.42345797
##   0      0.05357895  0.6725349  0.43439423
##   0      0.10615789  0.6747338  0.43094148
##   0      0.15873684  0.6725599  0.42391171
##   0      0.21131579  0.6725837  0.41818370
##   0      0.26389474  0.6770526  0.42406079
##   0      0.31647368  0.6769804  0.41856449
##   0      0.36905263  0.6726087  0.40853473
##   0      0.42163158  0.6638170  0.38542265
##   0      0.47421053  0.6660148  0.38902178
##   0      0.52678947  0.6594214  0.37628816
##   0      0.57936842  0.6550252  0.36510400
##   0      0.63194737  0.6528274  0.35927177
##   0      0.68452632  0.6418618  0.33471759
##   0      0.73710526  0.6352200  0.31832804
##   0      0.78968421  0.6307756  0.30720022
##   0      0.84226316  0.6263800  0.29777058
##   0      0.89484211  0.6220322  0.28739881
##   0      0.94742105  0.6220322  0.28739881
##   0      1.00000000  0.6220322  0.28682520
##   1      0.00100000  0.6240596  0.37352512
##   1      0.05357895  0.5187546  0.05457313
##   1      0.10615789  0.5142862  0.00000000
##   1      0.15873684  0.5142862  0.00000000
##   1      0.21131579  0.5142862  0.00000000
##   1      0.26389474  0.5142862  0.00000000
##   1      0.31647368  0.5142862  0.00000000
##   1      0.36905263  0.5142862  0.00000000
##   1      0.42163158  0.5142862  0.00000000
##   1      0.47421053  0.5142862  0.00000000
##   1      0.52678947  0.5142862  0.00000000
##   1      0.57936842  0.5142862  0.00000000
##   1      0.63194737  0.5142862  0.00000000
##   1      0.68452632  0.5142862  0.00000000
##   1      0.73710526  0.5142862  0.00000000
##   1      0.78968421  0.5142862  0.00000000
##   1      0.84226316  0.5142862  0.00000000
##   1      0.89484211  0.5142862  0.00000000
##   1      0.94742105  0.5142862  0.00000000
##   1      1.00000000  0.5142862  0.00000000
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0 and lambda = 0.2638947.
mean_accuracy_elastic_net_model1 <- mean(elastic_net_model1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_elastic_net_model1)
## [1] 0.5868408
modelTrain_mean_accuracy_cv_ENM1 <- mean_accuracy_elastic_net_model1
print(modelTrain_mean_accuracy_cv_ENM1)
## [1] 0.5868408
train_predictions <- predict(elastic_net_model1, newdata = trainData, type = "raw")

train_accuracy <- mean(train_predictions == trainData_ENM1$DX)

modelTrain_ENM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  0.863736263736264"
print(modelTrain_ENM1_trainAccuracy)
## [1] 0.8637363
predictions <- predict(elastic_net_model1, newdata = testData_ENM1)
cm_modelTrain_ENM1<- caret::confusionMatrix(predictions,testData_ENM1$DX)
print(cm_modelTrain_ENM1)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia MCI
##   CN       45        5  13
##   Dementia  0        8   0
##   MCI      21       15  86
## 
## Overall Statistics
##                                           
##                Accuracy : 0.7202          
##                  95% CI : (0.6512, 0.7823)
##     No Information Rate : 0.513           
##     P-Value [Acc > NIR] : 3.473e-09       
##                                           
##                   Kappa : 0.4987          
##                                           
##  Mcnemar's Test P-Value : 6.901e-05       
## 
## Statistics by Class:
## 
##                      Class: CN Class: Dementia Class: MCI
## Sensitivity             0.6818         0.28571     0.8687
## Specificity             0.8583         1.00000     0.6170
## Pos Pred Value          0.7143         1.00000     0.7049
## Neg Pred Value          0.8385         0.89189     0.8169
## Prevalence              0.3420         0.14508     0.5130
## Detection Rate          0.2332         0.04145     0.4456
## Detection Prevalence    0.3264         0.04145     0.6321
## Balanced Accuracy       0.7700         0.64286     0.7429
cm_modelTrain_ENM1_Accuracy <- cm_modelTrain_ENM1$overall["Accuracy"]
print(cm_modelTrain_ENM1_Accuracy)
##  Accuracy 
## 0.7202073
cm_modelTrain_ENM1_Kappa <- cm_modelTrain_ENM1$overall["Kappa"]
print(cm_modelTrain_ENM1_Kappa)
##     Kappa 
## 0.4986772
importance_elastic_net_model1<- varImp(elastic_net_model1)


print(importance_elastic_net_model1)
## glmnet variable importance
## 
##   variables are sorted by maximum importance across the classes
##   only 20 most important variables shown (out of 155)
## 
##               CN Dementia    MCI
## PC1        86.62  100.000 13.321
## PC2        68.42   88.612 20.132
## cg00962106 72.97   12.366 60.547
## cg02225060 43.14   18.834 62.035
## cg02981548 49.98    8.979 59.014
## cg23432430 57.30   15.766 41.471
## cg14710850 54.51    8.371 46.083
## cg16749614 20.69   33.680 54.425
## cg07152869 48.28   54.282  5.938
## cg08857872 29.00   24.416 53.480
## cg16652920 27.04   25.381 52.485
## cg26948066 51.17   42.097  9.011
## PC3        12.10   38.684 50.845
## cg08861434 48.61    1.041 49.709
## cg27452255 49.50   29.755 19.689
## cg09584650 48.12   20.551 27.505
## cg11133939 31.92   15.800 47.784
## cg19503462 47.24   44.923  2.257
## cg06864789 20.57   46.480 25.853
## cg02372404 30.75   14.690 45.496
plot(importance_elastic_net_model1, top = 20, main = "Variable Importance Plot")

importance_elastic_net_model1_df<-importance_elastic_net_model1$importance
if(METHOD_FEATURE_FLAG==3 ||METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG == 5 || METHOD_FEATURE_FLAG==6){
importance_elastic_net_final_model1 <- varImp(elastic_net_model1$finalModel)

library(dplyr)

Ordered_importance_elastic_net_final_model1 <- importance_elastic_net_final_model1 %>% arrange(desc(Overall))

print(Ordered_importance_elastic_net_final_model1) 
  
}
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_elastic_net_model1_df$Feature<-rownames(importance_elastic_net_model1_df)
  importance_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_elastic_net_model1_df)
  
}
##              CN    Dementia        MCI    Feature MaxImportance
## 1   86.61980738 100.0000000 13.3210618        PC1   100.0000000
## 2   68.42071110  88.6123141 20.1324722        PC2    88.6123141
## 3   72.97240265  12.3659467 60.5473251 cg00962106    72.9724027
## 4   43.14232480  18.8338927 62.0353483 cg02225060    62.0353483
## 5   49.97512219   8.9794598 59.0137128 cg02981548    59.0137128
## 6   57.29678176  15.7661987 41.4714522 cg23432430    57.2967818
## 7   54.51398398   8.3713581 46.0834950 cg14710850    54.5139840
## 8   20.68641851  33.6797465 54.4252959 cg16749614    54.4252959
## 9   48.28492677  54.2816940  5.9376364 cg07152869    54.2816940
## 10  29.00490368  24.4161325 53.4801671 cg08857872    53.4801671
## 11  27.04478573  25.3809708 52.4848873 cg16652920    52.4848873
## 12  51.16762313  42.0970116  9.0114807 cg26948066    51.1676231
## 13  12.10171481  38.6842811 50.8451268        PC3    50.8451268
## 14  48.60846917   1.0414802 49.7090802 cg08861434    49.7090802
## 15  49.50310946  29.7550893 19.6888893 cg27452255    49.5031095
## 16  48.11512671  20.5506815 27.5053143 cg09584650    48.1151267
## 17  31.92434694  15.8003166 47.7837944 cg11133939    47.7837944
## 18  47.23931827  44.9227936  2.2573938 cg19503462    47.2393183
## 19  20.56740794  46.4799364 25.8533976 cg06864789    46.4799364
## 20  30.74658536  14.6900138 45.4957300 cg02372404    45.4957300
## 21  13.69843362  45.3182174 31.5606530 cg24859648    45.3182174
## 22  10.38803025  34.7255433 45.1727044 cg14527649    45.1727044
## 23  44.71363816  32.6644840 11.9900233 cg03982462    44.7136382
## 24  43.78509791  14.9929464 28.7330206 cg06536614    43.7850979
## 25   0.06067044  43.3030784 43.1832771 cg17186592    43.3030784
## 26  26.35599836  16.7605895 43.1757187 cg26219488    43.1757187
## 27  42.96866193  14.0889255 28.8206056 cg10240127    42.9686619
## 28  13.43834291  42.8997598 29.4022860 cg00247094    42.8997598
## 29  35.47709793   6.8655424 42.4017712 cg20685672    42.4017712
## 30   3.60009203  42.1583119 38.4990890 cg25259265    42.1583119
## 31  42.14830982  14.2620551 27.8271239 cg16715186    42.1483098
## 32   0.72192332  41.9398729 41.1588187 cg05096415    41.9398729
## 33  34.83789455  41.7675213  6.8704959 cg15775217    41.7675213
## 34  15.97002286  40.5910153 24.5618616 cg24861747    40.5910153
## 35  34.02805902   6.2237390 40.3109289 cg07028768    40.3109289
## 36   4.43445619  39.7357422 35.2421551 cg14924512    39.7357422
## 37  24.98145608  39.6420632 14.6014763 cg03084184    39.6420632
## 38   4.47207238  39.0722526 34.5410494 cg05570109    39.0722526
## 39  34.88239186   4.0051024 38.9466251 cg01921484    38.9466251
## 40   9.76731403  27.7961405 37.6225854 cg00154902    37.6225854
## 41  28.32807068  37.4435859  9.0563843 cg26757229    37.4435859
## 42  37.36311741   9.8522199 27.4517666 cg03660162    37.3631174
## 43  35.88740462   0.5246911 36.4712266 cg12228670    36.4712266
## 44   4.42463153  31.7466394 36.2304018 cg00616572    36.2304018
## 45  14.12327162  36.1674624 21.9850599 cg20507276    36.1674624
## 46   5.46685523  35.4547743 29.9287882 cg05841700    35.4547743
## 47  21.87332560  13.5177259 35.4501824 cg06715136    35.4501824
## 48  22.83960358  12.2764600 35.1751945 cg02621446    35.1751945
## 49  18.36553359  35.0243219 16.5996575 cg12738248    35.0243219
## 50  14.23671333  34.9439501 20.6481059 cg09854620    34.9439501
## 51  32.22002224  34.8183721  2.5392190 cg00322003    34.8183721
## 52   8.08934808  26.6092839 34.7577628 cg24873924    34.7577628
## 53  14.18500560  34.7017215 20.4575850 cg03129555    34.7017215
## 54  34.68040616   7.5913417 27.0299336 cg04412904    34.6804062
## 55  15.01714515  19.5748660 34.6511420 cg17738613    34.6511420
## 56  18.92764654  15.5954372 34.5822146 cg25879395    34.5822146
## 57  34.34592880  10.8922720 23.3945260 cg05234269    34.3459288
## 58  22.75324748  34.0758121 11.2634338 cg20913114    34.0758121
## 59   1.11024972  32.5737876 33.7431682 cg02494911    33.7431682
## 60  17.47533175  33.5173057 15.9828431 cg00675157    33.5173057
## 61  26.90964711  33.4667383  6.4979603 cg12279734    33.4667383
## 62  12.81244098  20.5534266 33.4249985 cg01153376    33.4249985
## 63  30.30228696   2.9712219 33.3326397 cg04248279    33.3326397
## 64  30.64177910  33.2101071  2.5091972 cg06697310    33.2101071
## 65  25.58169803  32.8947626  7.2539337 cg26474732    32.8947626
## 66  19.20507436  13.6298519 32.8940571 cg16771215    32.8940571
## 67   1.21872114  32.7015628 31.4237108 cg12534577    32.7015628
## 68  14.55299268  32.4375273 17.8254038 cg06378561    32.4375273
## 69  19.19337215  13.1667746 32.4192776 cg18819889    32.4192776
## 70  29.78124856  32.2253710  2.3849916 cg01013522    32.2253710
## 71   8.94008603  23.2172015 32.2164184 cg10369879    32.2164184
## 72  31.34262662   9.3197110 21.9637847 cg03327352    31.3426266
## 73  31.30355116   8.6991762 22.5452441 cg07138269    31.3035512
## 74  30.28320694   0.7213164 31.0636542 cg12146221    31.0636542
## 75  31.02005091  11.5458527 19.4150674 cg11227702    31.0200509
## 76  30.51451274   0.2100065 30.7836501 cg27577781    30.7836501
## 77  30.74231821  29.3053393  1.3778481 cg02356645    30.7423182
## 78  10.89284695  19.6097264 30.5617042 cg15865722    30.5617042
## 79  21.13422867  30.5356551  9.3422956 cg18339359    30.5356551
## 80  21.72890824  30.5064745  8.7184354 cg08584917    30.5064745
## 81  30.48807238  16.2409255 14.1880161 cg15535896    30.4880724
## 82   9.35012792  30.3539424 20.9446836 cg01680303    30.3539424
## 83   0.66494142  29.5735803 30.2976526 cg01667144    30.2976526
## 84  17.55718658  29.9353599 12.3190425 cg07523188    29.9353599
## 85  12.72225388  17.0912940 29.8726788 cg21854924    29.8726788
## 86   9.99476417  29.7475418 19.6936468 cg10750306    29.7475418
## 87   5.72549778  29.6192962 23.8346676 cg16579946    29.6192962
## 88  29.45584605   5.8732413 23.5234739 cg11438323    29.4558461
## 89   7.90591688  29.3699310 21.4048833 cg18821122    29.3699310
## 90  13.47506890  15.5239825 29.0581823 cg01128042    29.0581823
## 91  12.44251146  16.5156100 29.0172524 cg14564293    29.0172524
## 92  28.70364944   0.4438248 28.2006938 cg08198851    28.7036494
## 93  25.91934227   2.7083398 28.6868129 cg00696044    28.6868129
## 94  28.65073400   7.4912723 21.1003308 cg17421046    28.6507340
## 95  28.22916163  14.2410737 13.9289571 cg11331837    28.2291616
## 96   4.58143848  23.1881761 27.8287454 cg12682323    27.8287454
## 97  27.76407045  23.1524875  4.5524521 cg02932958    27.7640704
## 98   2.23438876  27.7093483 25.4158287 cg23658987    27.7093483
## 99  13.54531406  14.0663595 27.6708044 cg07480176    27.6708044
## 100 18.99608527   8.5697728 27.6249890 cg10738648    27.6249890
## 101 23.24342302   4.2307549 27.5333088 cg03071582    27.5333088
## 102 27.51218319  13.7211465 13.7319058 cg25758034    27.5121832
## 103  8.31892344  18.5119214 26.8899757 cg06118351    26.8899757
## 104 26.47568285  26.6877656  0.1529519 cg19512141    26.6877656
## 105 15.77820329  26.6266949 10.7893607 cg23161429    26.6266949
## 106 13.98395631  26.3981501 12.3550629 cg11247378    26.3981501
## 107 18.59425527   7.6889075 26.3422936 cg20678988    26.3422936
## 108 14.37330607  11.5502174 25.9826543 cg27086157    25.9826543
## 109 25.84846449   9.7819707 16.0073629 cg03088219    25.8484645
## 110 13.63204082  25.2790065 11.5878348 cg22274273    25.2790065
## 111  2.73202846  22.3681532 25.1593125 cg13885788    25.1593125
## 112  7.97490935  16.6875985 24.7216387 cg14240646    24.7216387
## 113 23.64920445   0.7936390 24.5019743 cg06112204    24.5019743
## 114 24.37942064   4.9143530 19.4059368 cg17429539    24.3794206
## 115 23.06031956  24.3605205  1.2410701 cg25561557    24.3605205
## 116 21.12251075   3.1401716 24.3218132 cg14293999    24.3218132
## 117 15.52461212   8.6507741 24.2345170 cg19377607    24.2345170
## 118 21.14489724  24.1161937  2.9121656 cg06950937    24.1161937
## 119 24.10030759   4.0940187 19.9471581 cg25436480    24.1003076
## 120 14.61554620   9.0258936 23.7005707 cg00272795    23.7005707
## 121 10.00948192  13.3941500 23.4627628 cg12012426    23.4627628
## 122 23.38852933  17.1911787  6.1382198 cg05321907    23.3885293
## 123 23.16383334   9.9827959 13.1219066 cg20139683    23.1638333
## 124  0.72466966  23.1298320 22.3460315 cg26069044    23.1298320
## 125 21.03326043  22.4244053  1.3320140 cg23916408    22.4244053
## 126  0.60816447  22.2322811 21.5649857 cg27341708    22.2322811
## 127 15.97168251  22.2117286  6.1809152 cg13080267    22.2117286
## 128 21.86773439   1.3035382 20.5050654 cg27272246    21.8677344
## 129  0.95871508  21.8471387 20.8292928 cg12284872    21.8471387
## 130  2.41389221  21.7049413 19.2319182 cg00689685    21.7049413
## 131  2.01953773  21.5333800 19.4547114 cg16178271    21.5333800
## 132 21.28126202   8.1255260 13.0966052 cg21209485    21.2812620
## 133 20.59895207  10.6008980  9.9389232 cg24851651    20.5989521
## 134 20.34289806   7.3326234 12.9511438 cg21697769    20.3428981
## 135 20.33374499   6.2181332 14.0564810 cg04664583    20.3337450
## 136 14.64603304  19.9415172  5.2363533 cg00999469    19.9415172
## 137  2.27365806  17.4302757 19.7630646 cg20370184    19.7630646
## 138 18.98361847   4.1866558 14.7378318 cg11187460    18.9836185
## 139 18.44110528   2.0022682 16.3797062 cg12784167    18.4411053
## 140  1.20240217  16.9911440 18.2526771 cg02320265    18.2526771
## 141 17.49711486  13.5814940  3.8564900 cg12776173    17.4971149
## 142 17.28620363   1.2806951 15.9463776 cg08779649    17.2862036
## 143  8.18664789   8.9921517 17.2379305 cg01933473    17.2379305
## 144 17.18897418   8.9556535  8.1741898 cg15501526    17.1889742
## 145 13.77899505  16.9406406  3.1025147 cg10985055    16.9406406
## 146 16.16970264   6.7553447  9.3552271 cg17906851    16.1697026
## 147 11.30016436   4.7162247 16.0755199 cg14307563    16.0755199
## 148  4.33754653  14.3186706  9.9219932 cg16788319    14.3186706
## 149 11.35637215  13.8497088  2.4342058 cg24506579    13.8497088
## 150  9.52822170  12.4287167  2.8413641 cg27639199    12.4287167
## 151  1.91402524  10.3049754 12.2781315 cg12466610    12.2781315
## 152  9.01275784   2.1922847 11.2641734 cg15633912    11.2641734
## 153  0.00000000  11.1759032 11.2350341 cg01413796    11.2350341
## 154  1.46197753   0.1924735  1.7135819 cg01549082     1.7135819
## 155  0.71164295   0.0102105  0.7809843    age.now     0.7809843
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_elastic_net_model1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.

if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_elastic_net_model1_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_elastic_net_model1_df,n=20)$Feature)
  
  importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_elastic_net_model1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
##          CN   Dementia       MCI    Feature MaxImportance
## 1  86.61981 100.000000 13.321062        PC1     100.00000
## 2  68.42071  88.612314 20.132472        PC2      88.61231
## 3  72.97240  12.365947 60.547325 cg00962106      72.97240
## 4  43.14232  18.833893 62.035348 cg02225060      62.03535
## 5  49.97512   8.979460 59.013713 cg02981548      59.01371
## 6  57.29678  15.766199 41.471452 cg23432430      57.29678
## 7  54.51398   8.371358 46.083495 cg14710850      54.51398
## 8  20.68642  33.679747 54.425296 cg16749614      54.42530
## 9  48.28493  54.281694  5.937636 cg07152869      54.28169
## 10 29.00490  24.416133 53.480167 cg08857872      53.48017
## 11 27.04479  25.380971 52.484887 cg16652920      52.48489
## 12 51.16762  42.097012  9.011481 cg26948066      51.16762
## 13 12.10171  38.684281 50.845127        PC3      50.84513
## 14 48.60847   1.041480 49.709080 cg08861434      49.70908
## 15 49.50311  29.755089 19.688889 cg27452255      49.50311
## 16 48.11513  20.550682 27.505314 cg09584650      48.11513
## 17 31.92435  15.800317 47.783794 cg11133939      47.78379
## 18 47.23932  44.922794  2.257394 cg19503462      47.23932
## 19 20.56741  46.479936 25.853398 cg06864789      46.47994
## 20 30.74659  14.690014 45.495730 cg02372404      45.49573
## [1] "the top 20 features based on max way:"
##  [1] "PC1"        "PC2"        "cg00962106" "cg02225060" "cg02981548" "cg23432430" "cg14710850"
##  [8] "cg16749614" "cg07152869" "cg08857872" "cg16652920" "cg26948066" "PC3"        "cg08861434"
## [15] "cg27452255" "cg09584650" "cg11133939" "cg19503462" "cg06864789" "cg02372404"
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.

if(METHOD_FEATURE_FLAG == 5){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  modelTrain_ENM1_AUC <-auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG ==6){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  modelTrain_ENM1_AUC <-auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 3){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  modelTrain_ENM1_AUC <-auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if (METHOD_FEATURE_FLAG ==1){
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.8682
## The AUC value for class CN is: 0.8681699 
## 
## Class: Dementia 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.8656
## The AUC value for class Dementia is: 0.8655844 
## 
## Class: MCI 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.8361
## The AUC value for class MCI is: 0.8361272

if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    modelTrain_ENM1_AUC <-mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.8566272
print(modelTrain_ENM1_AUC)
## [1] 0.8566272

4. XGBoost

4.1 XGBoost Model Training

library(caret)
library(xgboost)
library(dplyr)
library(doParallel)
# Start point of parallel processing
numCores <- detectCores() - 1
c2 <- makeCluster(numCores)
registerDoParallel(c2)
df_XGB1<-processed_data 
featureName_XGB1<-AfterProcess_FeatureName
set.seed(123)
trainIndex <- createDataPartition(df_XGB1$DX, p = 0.7, list = FALSE)
trainData_XGB1<- df_XGB1[trainIndex, ]
testData_XGB1 <- df_XGB1[-trainIndex, ]
cv_control <- trainControl(method = "cv", number = 5, allowParallel = TRUE)

xgb_model <- caret::train(
  DX ~ ., data = trainData_XGB1,
  method = "xgbTree", trControl = cv_control,
  metric = "Accuracy"
)

print(xgb_model)
## eXtreme Gradient Boosting 
## 
## 455 samples
## 155 predictors
##   3 classes: 'CN', 'Dementia', 'MCI' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 364, 365, 363, 364, 364 
## Resampling results across tuning parameters:
## 
##   eta  max_depth  colsample_bytree  subsample  nrounds  Accuracy   Kappa    
##   0.3  1          0.6               0.50        50      0.5979264  0.2557362
##   0.3  1          0.6               0.50       100      0.5759956  0.2294683
##   0.3  1          0.6               0.50       150      0.5868647  0.2544769
##   0.3  1          0.6               0.75        50      0.5495987  0.1698425
##   0.3  1          0.6               0.75       100      0.5626878  0.1993241
##   0.3  1          0.6               0.75       150      0.5824202  0.2463429
##   0.3  1          0.6               1.00        50      0.5408547  0.1416942
##   0.3  1          0.6               1.00       100      0.5342358  0.1484288
##   0.3  1          0.6               1.00       150      0.5584361  0.1997970
##   0.3  1          0.8               0.50        50      0.5737490  0.2177328
##   0.3  1          0.8               0.50       100      0.5868886  0.2641710
##   0.3  1          0.8               0.50       150      0.5913330  0.2705229
##   0.3  1          0.8               0.75        50      0.5583171  0.1837199
##   0.3  1          0.8               0.75       100      0.5691835  0.2208545
##   0.3  1          0.8               0.75       150      0.5715751  0.2358985
##   0.3  1          0.8               1.00        50      0.5297924  0.1194723
##   0.3  1          0.8               1.00       100      0.5387036  0.1529392
##   0.3  1          0.8               1.00       150      0.5518904  0.1874991
##   0.3  2          0.6               0.50        50      0.5715268  0.2229975
##   0.3  2          0.6               0.50       100      0.5647890  0.2149601
##   0.3  2          0.6               0.50       150      0.5845931  0.2580084
##   0.3  2          0.6               0.75        50      0.5560939  0.1777674
##   0.3  2          0.6               0.75       100      0.5649812  0.2095009
##   0.3  2          0.6               0.75       150      0.5804146  0.2413210
##   0.3  2          0.6               1.00        50      0.5627839  0.1889220
##   0.3  2          0.6               1.00       100      0.5737007  0.2143958
##   0.3  2          0.6               1.00       150      0.5825891  0.2369697
##   0.3  2          0.8               0.50        50      0.5650783  0.2118362
##   0.3  2          0.8               0.50       100      0.5782428  0.2405047
##   0.3  2          0.8               0.50       150      0.5826140  0.2473843
##   0.3  2          0.8               0.75        50      0.5736529  0.2121701
##   0.3  2          0.8               0.75       100      0.5758507  0.2209327
##   0.3  2          0.8               0.75       150      0.5913558  0.2527139
##   0.3  2          0.8               1.00        50      0.5538716  0.1791261
##   0.3  2          0.8               1.00       100      0.5626151  0.2025712
##   0.3  2          0.8               1.00       150      0.5692090  0.2129927
##   0.3  3          0.6               0.50        50      0.5847162  0.2338449
##   0.3  3          0.6               0.50       100      0.5957536  0.2679650
##   0.3  3          0.6               0.50       150      0.6067675  0.2920283
##   0.3  3          0.6               0.75        50      0.5846669  0.2343243
##   0.3  3          0.6               0.75       100      0.5802952  0.2253302
##   0.3  3          0.6               0.75       150      0.5759234  0.2238336
##   0.3  3          0.6               1.00        50      0.5495010  0.1672815
##   0.3  3          0.6               1.00       100      0.5714795  0.2118475
##   0.3  3          0.6               1.00       150      0.5626878  0.2007798
##   0.3  3          0.8               0.50        50      0.5562160  0.1821684
##   0.3  3          0.8               0.50       100      0.5605877  0.1972578
##   0.3  3          0.8               0.50       150      0.5847651  0.2467598
##   0.3  3          0.8               0.75        50      0.5518188  0.1673963
##   0.3  3          0.8               0.75       100      0.5627356  0.1969223
##   0.3  3          0.8               0.75       150      0.5694017  0.2138141
##   0.3  3          0.8               1.00        50      0.5713580  0.2070527
##   0.3  3          0.8               1.00       100      0.5691841  0.2075598
##   0.3  3          0.8               1.00       150      0.5758751  0.2255635
##   0.4  1          0.6               0.50        50      0.5208791  0.1346972
##   0.4  1          0.6               0.50       100      0.5472777  0.1970138
##   0.4  1          0.6               0.50       150      0.5583883  0.2176027
##   0.4  1          0.6               0.75        50      0.5341381  0.1567189
##   0.4  1          0.6               0.75       100      0.5890864  0.2648557
##   0.4  1          0.6               0.75       150      0.5781207  0.2488912
##   0.4  1          0.6               1.00        50      0.5497431  0.1686314
##   0.4  1          0.6               1.00       100      0.5562133  0.1915082
##   0.4  1          0.6               1.00       150      0.5584116  0.2032226
##   0.4  1          0.8               0.50        50      0.5496698  0.1764031
##   0.4  1          0.8               0.50       100      0.5496210  0.1921672
##   0.4  1          0.8               0.50       150      0.5648851  0.2291330
##   0.4  1          0.8               0.75        50      0.5321357  0.1487241
##   0.4  1          0.8               0.75       100      0.5561432  0.2063757
##   0.4  1          0.8               0.75       150      0.5759468  0.2431895
##   0.4  1          0.8               1.00        50      0.5431491  0.1528527
##   0.4  1          0.8               1.00       100      0.5649578  0.2090746
##   0.4  1          0.8               1.00       150      0.5605855  0.2123300
##   0.4  2          0.6               0.50        50      0.5824208  0.2476564
##   0.4  2          0.6               0.50       100      0.5736773  0.2338540
##   0.4  2          0.6               0.50       150      0.5758996  0.2444243
##   0.4  2          0.6               0.75        50      0.5670595  0.2122257
##   0.4  2          0.6               0.75       100      0.5759956  0.2289802
##   0.4  2          0.6               0.75       150      0.5671317  0.2195227
##   0.4  2          0.6               1.00        50      0.5715284  0.2208683
##   0.4  2          0.6               1.00       100      0.5803180  0.2417389
##   0.4  2          0.6               1.00       150      0.5912104  0.2650706
##   0.4  2          0.8               0.50        50      0.5648622  0.2165493
##   0.4  2          0.8               0.50       100      0.5672289  0.2242254
##   0.4  2          0.8               0.50       150      0.5715517  0.2349181
##   0.4  2          0.8               0.75        50      0.5670101  0.2117268
##   0.4  2          0.8               0.75       100      0.5713320  0.2328243
##   0.4  2          0.8               0.75       150      0.5758003  0.2414781
##   0.4  2          0.8               1.00        50      0.5560949  0.1944241
##   0.4  2          0.8               1.00       100      0.5758518  0.2270638
##   0.4  2          0.8               1.00       150      0.5869618  0.2473621
##   0.4  3          0.6               0.50        50      0.5979997  0.2701671
##   0.4  3          0.6               0.50       100      0.5980480  0.2779443
##   0.4  3          0.6               0.50       150      0.6001736  0.2828485
##   0.4  3          0.6               0.75        50      0.5757546  0.2302640
##   0.4  3          0.6               0.75       100      0.5736285  0.2246279
##   0.4  3          0.6               0.75       150      0.5781446  0.2352394
##   0.4  3          0.6               1.00        50      0.5559473  0.1840425
##   0.4  3          0.6               1.00       100      0.5605617  0.1996699
##   0.4  3          0.6               1.00       150      0.5648845  0.2113145
##   0.4  3          0.8               0.50        50      0.5759734  0.2321133
##   0.4  3          0.8               0.50       100      0.5957063  0.2650687
##   0.4  3          0.8               0.50       150      0.5957058  0.2652292
##   0.4  3          0.8               0.75        50      0.5715517  0.2200919
##   0.4  3          0.8               0.75       100      0.5979976  0.2700231
##   0.4  3          0.8               0.75       150      0.5870080  0.2551672
##   0.4  3          0.8               1.00        50      0.5298890  0.1345156
##   0.4  3          0.8               1.00       100      0.5298407  0.1369925
##   0.4  3          0.8               1.00       150      0.5451776  0.1675295
## 
## Tuning parameter 'gamma' was held constant at a value of 0
## Tuning parameter
##  'min_child_weight' was held constant at a value of 1
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were nrounds = 150, max_depth = 3, eta = 0.3, gamma =
##  0, colsample_bytree = 0.6, min_child_weight = 1 and subsample = 0.5.
mean_accuracy_xgb_model<- mean(xgb_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_xgb_model)
## [1] 0.5686429
modelTrain_mean_accuracy_cv_xgb <- mean_accuracy_xgb_model
print(modelTrain_mean_accuracy_cv_xgb)
## [1] 0.5686429
train_predictions <- predict(xgb_model, newdata = trainData_XGB1, type = "raw")

train_accuracy <- mean(train_predictions == trainData_XGB1$DX)

modelTrain_xgb_trainAccuracy <- train_accuracy
print(paste("Training Accuracy: ", modelTrain_xgb_trainAccuracy))
## [1] "Training Accuracy:  1"
predictions <- predict(xgb_model, newdata = testData_XGB1)
cm_modelTrain_xgb <- caret::confusionMatrix(predictions,testData_XGB1$DX)
print(cm_modelTrain_xgb)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia MCI
##   CN       25        4  18
##   Dementia  4        8   1
##   MCI      37       16  80
## 
## Overall Statistics
##                                           
##                Accuracy : 0.5855          
##                  95% CI : (0.5125, 0.6558)
##     No Information Rate : 0.513           
##     P-Value [Acc > NIR] : 0.0256922       
##                                           
##                   Kappa : 0.2511          
##                                           
##  Mcnemar's Test P-Value : 0.0001868       
## 
## Statistics by Class:
## 
##                      Class: CN Class: Dementia Class: MCI
## Sensitivity             0.3788         0.28571     0.8081
## Specificity             0.8268         0.96970     0.4362
## Pos Pred Value          0.5319         0.61538     0.6015
## Neg Pred Value          0.7192         0.88889     0.6833
## Prevalence              0.3420         0.14508     0.5130
## Detection Rate          0.1295         0.04145     0.4145
## Detection Prevalence    0.2435         0.06736     0.6891
## Balanced Accuracy       0.6028         0.62771     0.6221
cm_modelTrain_xgb_Accuracy <- cm_modelTrain_xgb$overall["Accuracy"]
cm_modelTrain_xgb_Kappa <- cm_modelTrain_xgb$overall["Kappa"]
print(cm_modelTrain_xgb_Accuracy)
##  Accuracy 
## 0.5854922
print(cm_modelTrain_xgb_Kappa)
##     Kappa 
## 0.2510671
importance_xgb_model<- varImp(xgb_model)

print(importance_xgb_model)
## xgbTree variable importance
## 
##   only 20 most important variables shown (out of 155)
## 
##            Overall
## age.now     100.00
## cg05096415   58.67
## cg15501526   53.97
## cg00962106   52.79
## cg16652920   51.84
## cg14564293   50.39
## cg06864789   50.38
## cg25259265   49.28
## cg04412904   48.22
## cg08857872   46.76
## cg09584650   45.71
## cg01921484   44.85
## cg01128042   42.34
## cg16771215   42.19
## cg02621446   41.35
## cg02981548   40.94
## cg15865722   38.31
## cg03327352   37.93
## cg26948066   37.59
## cg02494911   36.54
plot(importance_xgb_model, top = 20, main = "Variable Importance Plot")

importance_xgb_model_df<-importance_xgb_model$importance
importance <- xgb.importance(model = xgb_model$finalModel)
xgb.plot.importance(importance_matrix = importance)

ordered_importance <- importance[order(-importance$Importance), ]
print(ordered_importance)
##         Feature         Gain        Cover   Frequency   Importance
##          <char>        <num>        <num>       <num>        <num>
##   1:    age.now 0.0277964662 0.0305758717 0.014948454 0.0277964662
##   2: cg05096415 0.0165271264 0.0151987640 0.009793814 0.0165271264
##   3: cg15501526 0.0152436978 0.0079960962 0.008762887 0.0152436978
##   4: cg00962106 0.0149232198 0.0145429395 0.009793814 0.0149232198
##   5: cg16652920 0.0146648709 0.0104445177 0.007216495 0.0146648709
##  ---                                                              
## 151: cg04664583 0.0012122062 0.0005288135 0.001030928 0.0012122062
## 152: cg06112204 0.0010745149 0.0024687596 0.004123711 0.0010745149
## 153: cg20678988 0.0010299784 0.0024307498 0.003092784 0.0010299784
## 154: cg07480176 0.0007840553 0.0011485574 0.002577320 0.0007840553
## 155: cg27452255 0.0005271072 0.0008672921 0.002577320 0.0005271072
stopCluster(c2)
registerDoSEQ()
if(METHOD_FEATURE_FLAG == 5){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  modelTrain_xgb_AUC<-auc_value
  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  modelTrain_xgb_AUC<-auc_value
  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  modelTrain_xgb_AUC<-auc_value
  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.7177
## The AUC value for class CN is: 0.7177285 
## 
## Class: Dementia 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.7652
## The AUC value for class Dementia is: 0.7651515 
## 
## Class: MCI 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.7245
## The AUC value for class MCI is: 0.7244788

if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    
    modelTrain_xgb_AUC<-mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.7357863
print(modelTrain_xgb_AUC)
## [1] 0.7357863

5. Random Forest

5.1 Random Forest Model Training

library(caret)
library(randomForest)
df_RFM1<-processed_data 
featureName_RFM1<-AfterProcess_FeatureName
library(randomForest)

set.seed(123) 
trainIndex <- createDataPartition(df_RFM1$DX, p = 0.7, list = FALSE)
train_data_RFM1 <- df_RFM1[trainIndex, ]
test_data_RFM1 <- df_RFM1[-trainIndex, ]

X_train_RFM1 <- subset(train_data_RFM1, select = -DX)
y_train_RFM1 <- train_data_RFM1$DX
X_train_RFM1 <- subset(test_data_RFM1, select = -DX)
y_test_RFM1 <- test_data_RFM1$DX
ctrl <- trainControl(method = "cv", number = 5, classProbs = TRUE)

rf_model <- caret::train(
  DX ~ ., data = train_data_RFM1,
  method = "rf", trControl = ctrl,
  metric = "Accuracy",
  importance = TRUE
)

print(rf_model)
## Random Forest 
## 
## 455 samples
## 155 predictors
##   3 classes: 'CN', 'Dementia', 'MCI' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 364, 365, 363, 364, 364 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa     
##     2   0.5363147  0.05522911
##    78   0.5604672  0.13791728
##   155   0.5451298  0.10733955
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 78.
mean_accuracy_rf_model<- mean(rf_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
modelTrain_mean_accuracy_cv_rf <- mean_accuracy_rf_model
print(modelTrain_mean_accuracy_cv_rf)
## [1] 0.5473039
train_predictions <- predict(rf_model, newdata = train_data_RFM1, type = "raw")

train_accuracy <- mean(train_predictions == train_data_RFM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  1"
modelTrain_rf_trainAccuracy <- train_accuracy
print(modelTrain_rf_trainAccuracy)
## [1] 1
predictions <- predict(rf_model, newdata = test_data_RFM1)
cm_modelTrain_rf <- caret::confusionMatrix(predictions,test_data_RFM1$DX)
print(cm_modelTrain_rf)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia MCI
##   CN       19        7   8
##   Dementia  0        0   0
##   MCI      47       21  91
## 
## Overall Statistics
##                                           
##                Accuracy : 0.5699          
##                  95% CI : (0.4969, 0.6408)
##     No Information Rate : 0.513           
##     P-Value [Acc > NIR] : 0.06504         
##                                           
##                   Kappa : 0.1684          
##                                           
##  Mcnemar's Test P-Value : 4.978e-12       
## 
## Statistics by Class:
## 
##                      Class: CN Class: Dementia Class: MCI
## Sensitivity            0.28788          0.0000     0.9192
## Specificity            0.88189          1.0000     0.2766
## Pos Pred Value         0.55882             NaN     0.5723
## Neg Pred Value         0.70440          0.8549     0.7647
## Prevalence             0.34197          0.1451     0.5130
## Detection Rate         0.09845          0.0000     0.4715
## Detection Prevalence   0.17617          0.0000     0.8238
## Balanced Accuracy      0.58488          0.5000     0.5979
cm_modelTrain_rf_Accuracy <- cm_modelTrain_rf$overall["Accuracy"]
cm_modelTrain_rf_Kappa <- cm_modelTrain_rf$overall["Kappa"]
print(cm_modelTrain_rf_Accuracy)
##  Accuracy 
## 0.5699482
print(cm_modelTrain_rf_Kappa)
##     Kappa 
## 0.1684489
importance_rf_model <- varImp(rf_model)

print(importance_rf_model)
## rf variable importance
## 
##   variables are sorted by maximum importance across the classes
##   only 20 most important variables shown (out of 155)
## 
##                 CN Dementia     MCI
## cg15501526 51.3208    12.11 100.000
## cg01153376 10.9139    54.97  78.593
## cg08857872 48.2026    34.71  65.332
## cg12279734 37.8346    63.84  37.870
## cg06864789 28.4816    58.20  19.875
## cg00962106 45.6839    35.98  57.926
## cg23658987 57.5291    20.66  30.886
## age.now    33.4227    48.57  56.948
## cg16652920 13.5100    31.95  56.438
## cg01921484 33.6159    19.10  54.046
## cg14293999 20.8676    18.92  53.149
## cg25259265 29.4613    51.43  52.515
## cg02494911  0.7986    37.06  52.461
## cg05570109 11.5290    42.89  51.497
## cg21209485 24.8994    51.40  19.205
## cg16579946 25.7024    23.83  49.530
## cg14710850 25.4326    16.41  49.017
## cg17186592 31.5157    49.02  40.990
## cg14924512 19.1858    48.90  34.370
## cg07523188 48.7428    31.32   9.632
plot(importance_rf_model, top = 20, main = "Variable Importance Plot")

importance_rf_model_df<-importance_rf_model$importance
if(METHOD_FEATURE_FLAG==5){
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(MCI))

print(Ordered_importance_rf_final_model)
  
}
if(METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==6){
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(Dementia))

print(Ordered_importance_rf_final_model)
  
}
if(METHOD_FEATURE_FLAG==3){
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(CI))

print(Ordered_importance_rf_final_model)
  
}
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_rf_model_df$Feature<-rownames(importance_rf_model_df)
  importance_rf_model_df <- importance_rf_model_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_rf_model_df)
  
}
##             CN  Dementia        MCI    Feature MaxImportance
## 1   51.3207894 12.108173 100.000000 cg15501526     100.00000
## 2   10.9138752 54.970568  78.592773 cg01153376      78.59277
## 3   48.2026308 34.713172  65.331798 cg08857872      65.33180
## 4   37.8346214 63.835427  37.870293 cg12279734      63.83543
## 5   28.4815869 58.199940  19.874651 cg06864789      58.19994
## 6   45.6839260 35.980721  57.926383 cg00962106      57.92638
## 7   57.5290605 20.659600  30.885574 cg23658987      57.52906
## 8   33.4227207 48.570461  56.947839    age.now      56.94784
## 9   13.5100360 31.948580  56.438010 cg16652920      56.43801
## 10  33.6159012 19.104834  54.046479 cg01921484      54.04648
## 11  20.8675761 18.919852  53.148657 cg14293999      53.14866
## 12  29.4613424 51.431625  52.515092 cg25259265      52.51509
## 13   0.7986121 37.064166  52.461334 cg02494911      52.46133
## 14  11.5289680 42.894354  51.496632 cg05570109      51.49663
## 15  24.8993598 51.404685  19.204564 cg21209485      51.40469
## 16  25.7024262 23.834731  49.529940 cg16579946      49.52994
## 17  25.4326260 16.410955  49.017250 cg14710850      49.01725
## 18  31.5157046 49.015720  40.989937 cg17186592      49.01572
## 19  19.1857976 48.896195  34.370457 cg14924512      48.89619
## 20  48.7427629 31.319425   9.631507 cg07523188      48.74276
## 21  40.6221994 48.712596  34.953845 cg27639199      48.71260
## 22  48.0206414 25.494056  28.354834 cg11133939      48.02064
## 23  47.5935714 36.219234  37.732426 cg00154902      47.59357
## 24  38.9462725 26.351441  47.583103 cg27086157      47.58310
## 25  18.1683746 32.064900  47.507694 cg11331837      47.50769
## 26  30.6350882 44.589577  47.479350 cg04664583      47.47935
## 27  31.3935616 43.522626  47.450571 cg02621446      47.45057
## 28  47.0986275 39.596778  31.432181 cg16771215      47.09863
## 29  23.4664486 33.055876  47.053678 cg10738648      47.05368
## 30  35.6246443 34.742753  46.714868 cg09854620      46.71487
## 31  13.0742870 46.562383  29.908991 cg23916408      46.56238
## 32  46.3269131 42.570326  17.443387 cg25561557      46.32691
## 33  46.2954606 27.241196  29.628992 cg23432430      46.29546
## 34  45.8511559 24.268446  39.531834 cg10240127      45.85116
## 35  35.3034617 34.279440  45.628304 cg03084184      45.62830
## 36  20.4385245 32.574263  45.534112 cg12228670      45.53411
## 37  22.5847973 23.657220  45.377279 cg20370184      45.37728
## 38  44.9736608 34.789021  18.313085 cg27577781      44.97366
## 39  44.1773684 24.876401  17.122350 cg10369879      44.17737
## 40  29.0142791 22.200580  44.059881 cg06118351      44.05988
## 41  33.6375593 44.033332  36.441838 cg14564293      44.03333
## 42  31.8262561 43.913244  25.732220 cg24859648      43.91324
## 43  18.8419132 43.856618  18.311267 cg27341708      43.85662
## 44  20.0321996 43.647524  31.312070 cg05096415      43.64752
## 45  38.6026083 20.420003  43.564278 cg04412904      43.56428
## 46  26.9174668 43.147902  19.879603 cg18339359      43.14790
## 47  25.3216670 43.116395  30.472543 cg00322003      43.11639
## 48  37.7708111 30.814926  42.181775 cg10985055      42.18178
## 49  42.0129171 27.265964  37.279751 cg05234269      42.01292
## 50  19.9269499 41.980414  23.295586 cg12534577      41.98041
## 51  20.8633676 41.974223  31.281430 cg00999469      41.97422
## 52  29.0897665 41.971921  34.320613 cg16178271      41.97192
## 53  41.9470664 28.823836  20.424692 cg05321907      41.94707
## 54  29.8582156 41.845388  32.224005 cg26948066      41.84539
## 55  39.4537402 35.363548  41.742264 cg17429539      41.74226
## 56  41.6746653 36.721771  17.974660 cg13885788      41.67467
## 57  16.3528060 27.559903  41.529725 cg01549082      41.52973
## 58  27.8506003 11.269979  41.440149 cg01013522      41.44015
## 59  34.7582085 15.224071  41.177109 cg12466610      41.17711
## 60  40.7441049 28.379776  12.261373 cg01680303      40.74410
## 61  23.2554475 40.716489  19.137615 cg06697310      40.71649
## 62  40.4904076 24.211959  25.412176 cg01667144      40.49041
## 63  28.2016064 40.319173  16.862264 cg20913114      40.31917
## 64   9.0074309 40.042779  33.004879 cg02320265      40.04278
## 65  24.9235624 25.915937  39.999881 cg24873924      39.99988
## 66  28.4980181  9.808450  39.957892 cg17906851      39.95789
## 67  39.7728348 28.934544  29.411513 cg03327352      39.77283
## 68  35.3654197 10.561828  39.584403 cg14240646      39.58440
## 69  30.3272001 13.437584  39.492560 cg27272246      39.49256
## 70  35.9409198 39.388577  38.182389 cg02225060      39.38858
## 71  39.3558964 31.408248  19.442412 cg09584650      39.35590
## 72  13.2425645 39.327065  36.842720 cg00247094      39.32707
## 73  32.6987423 37.479267  39.315843 cg01128042      39.31584
## 74   8.6379945 18.821716  39.237911 cg15535896      39.23791
## 75  39.2312131 23.803327  33.203209 cg11187460      39.23121
## 76  39.1274904 20.618820  26.386871 cg24506579      39.12749
## 77  24.2227155 31.155447  38.983723 cg02981548      38.98372
## 78  38.8417423 34.093954  37.621267 cg26757229      38.84174
## 79  38.4691112 16.234568  35.175021        PC2      38.46911
## 80  38.3379213 25.516240  14.240647 cg20507276      38.33792
## 81  32.5200267 33.419767  38.268589 cg15775217      38.26859
## 82  32.6932126  5.572029  38.205181 cg12146221      38.20518
## 83  36.7978241 38.190693  36.347623 cg07028768      38.19069
## 84  32.8869773 16.839476  38.161479        PC3      38.16148
## 85   7.6831196 37.792686  23.243355 cg03982462      37.79269
## 86  25.0462440 26.303589  37.563028 cg02372404      37.56303
## 87  20.2453361 37.306923   8.526062 cg23161429      37.30692
## 88  31.2255588 30.121119  37.195451 cg19512141      37.19545
## 89  33.0451528 18.246981  37.187715 cg06715136      37.18771
## 90  32.7260270 33.631567  37.171932 cg17421046      37.17193
## 91  29.0856804 20.970543  36.793133 cg12284872      36.79313
## 92  24.2592202 36.445780  32.816724 cg12682323      36.44578
## 93  24.6874839 14.739186  36.422434 cg25879395      36.42243
## 94  36.0798290 29.909161   0.000000 cg06950937      36.07983
## 95  30.9845240 25.130094  35.964877 cg26219488      35.96488
## 96  29.7106058 26.260707  35.453553 cg27452255      35.45355
## 97  29.7340080 35.290011  33.036275 cg00616572      35.29001
## 98   9.3209127 27.239359  35.088114 cg14527649      35.08811
## 99  25.3936604 34.696248  15.141832 cg18819889      34.69625
## 100 34.6732058 26.388891  12.211602 cg07152869      34.67321
## 101 32.4918424 14.284514  34.474142 cg08198851      34.47414
## 102 33.3990786 34.361560  22.016766 cg00689685      34.36156
## 103 23.6005807 34.304993  30.468770 cg00675157      34.30499
## 104 33.6208352 29.434168  34.244628 cg14307563      34.24463
## 105 18.0087674 33.891664  16.710074 cg07480176      33.89166
## 106 29.3104005 32.346665  33.615513 cg24861747      33.61551
## 107 26.8480472 24.509324  33.615241 cg01933473      33.61524
## 108 30.7311397 28.756002  33.587076 cg26069044      33.58708
## 109 33.5837621 18.489991  27.013304 cg11247378      33.58376
## 110  9.8866482 33.445459  27.570780 cg03071582      33.44546
## 111 21.6458703 31.711592  33.324979 cg19377607      33.32498
## 112 21.3482102 21.859410  33.231961 cg03088219      33.23196
## 113 30.9487918 15.495092  32.814968 cg20685672      32.81497
## 114 32.5626185  8.111475  27.626780 cg25758034      32.56262
## 115 32.4465565 23.319557  22.746918 cg06112204      32.44656
## 116 29.2607042 12.499356  32.434410 cg08861434      32.43441
## 117 26.5397967 32.336843  30.530235 cg16788319      32.33684
## 118 23.2864860 31.787986  32.316051 cg00696044      32.31605
## 119 11.8854076 25.029164  32.199893 cg12784167      32.19989
## 120 18.7416463 13.572058  32.182320 cg08779649      32.18232
## 121 25.3778152 31.987940  26.855880 cg12738248      31.98794
## 122 30.3577137 18.983288  31.966732 cg21697769      31.96673
## 123 31.6246521 26.850609  31.747640 cg16715186      31.74764
## 124 23.2012161 31.453217  29.395608 cg18821122      31.45322
## 125 31.2180154 21.950730  12.433383 cg15633912      31.21802
## 126 27.7131591 30.275814  31.034618 cg04248279      31.03462
## 127 30.9693924 26.006295  29.916371        PC1      30.96939
## 128 30.5366925 22.831281  12.174570 cg03129555      30.53669
## 129 25.9166007 23.145547  30.504210 cg15865722      30.50421
## 130 26.9351172 30.327838  26.409878 cg03660162      30.32784
## 131 16.3975287 30.173891  18.247525 cg26474732      30.17389
## 132 29.9984974 21.498342  24.947071 cg06378561      29.99850
## 133 29.9174258 17.304786  27.198307 cg13080267      29.91743
## 134 26.2555342 29.565324  19.362089 cg17738613      29.56532
## 135 11.0786764 28.941198  16.463878 cg22274273      28.94120
## 136 24.7057168 28.302285  27.745206 cg00272795      28.30229
## 137 20.9562058 23.838892  28.153301 cg16749614      28.15330
## 138 15.4188235 27.981804  18.241912 cg02932958      27.98180
## 139 23.3813797 27.669491  13.993777 cg08584917      27.66949
## 140 22.0653600  9.948338  27.607669 cg01413796      27.60767
## 141 25.2568598 23.157348  27.477514 cg10750306      27.47751
## 142 17.8059395 27.170186  22.531944 cg21854924      27.17019
## 143 27.0883280 24.964319  16.397580 cg12012426      27.08833
## 144 23.4371413 26.584314  21.675006 cg07138269      26.58431
## 145 25.9385491 25.725706  13.733208 cg19503462      25.93855
## 146 10.0013844 25.606045  17.249206 cg12776173      25.60605
## 147 25.3789296 22.629852   1.361451 cg02356645      25.37893
## 148 21.1725288 24.971799  16.752787 cg06536614      24.97180
## 149 20.8261680 24.426003  16.148522 cg11227702      24.42600
## 150 15.1961035 22.975841  24.257491 cg20139683      24.25749
## 151 22.1006728 20.385236  24.018683 cg11438323      24.01868
## 152 21.3477705 13.788454  23.361039 cg24851651      23.36104
## 153 15.1127270 22.916516  20.702346 cg20678988      22.91652
## 154 16.3609230 22.612790  16.644260 cg05841700      22.61279
## 155 10.2231141 21.190076  17.219737 cg25436480      21.19008
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_rf_model_df <- importance_rf_model_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_rf_model_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.

if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_rf_model_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_rf_model_df,n=20)$Feature)
  
  importance_melted_rf_model_df <- importance_rf_model_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_rf_model_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
##            CN Dementia        MCI    Feature MaxImportance
## 1  51.3207894 12.10817 100.000000 cg15501526     100.00000
## 2  10.9138752 54.97057  78.592773 cg01153376      78.59277
## 3  48.2026308 34.71317  65.331798 cg08857872      65.33180
## 4  37.8346214 63.83543  37.870293 cg12279734      63.83543
## 5  28.4815869 58.19994  19.874651 cg06864789      58.19994
## 6  45.6839260 35.98072  57.926383 cg00962106      57.92638
## 7  57.5290605 20.65960  30.885574 cg23658987      57.52906
## 8  33.4227207 48.57046  56.947839    age.now      56.94784
## 9  13.5100360 31.94858  56.438010 cg16652920      56.43801
## 10 33.6159012 19.10483  54.046479 cg01921484      54.04648
## 11 20.8675761 18.91985  53.148657 cg14293999      53.14866
## 12 29.4613424 51.43162  52.515092 cg25259265      52.51509
## 13  0.7986121 37.06417  52.461334 cg02494911      52.46133
## 14 11.5289680 42.89435  51.496632 cg05570109      51.49663
## 15 24.8993598 51.40469  19.204564 cg21209485      51.40469
## 16 25.7024262 23.83473  49.529940 cg16579946      49.52994
## 17 25.4326260 16.41095  49.017250 cg14710850      49.01725
## 18 31.5157046 49.01572  40.989937 cg17186592      49.01572
## 19 19.1857976 48.89619  34.370457 cg14924512      48.89619
## 20 48.7427629 31.31943   9.631507 cg07523188      48.74276
## [1] "the top 20 features based on max way:"
##  [1] "cg15501526" "cg01153376" "cg08857872" "cg12279734" "cg06864789" "cg00962106" "cg23658987"
##  [8] "age.now"    "cg16652920" "cg01921484" "cg14293999" "cg25259265" "cg02494911" "cg05570109"
## [15] "cg21209485" "cg16579946" "cg14710850" "cg17186592" "cg14924512" "cg07523188"
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.

if(METHOD_FEATURE_FLAG == 5){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")

  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc
  print(auc_value) 
  modelTrain_rf_AUC <- auc_value
  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")

  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc
  print(auc_value) 
  modelTrain_rf_AUC <- auc_value
  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 3){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")

  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "CI"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc
  print(auc_value) 
  modelTrain_rf_AUC <- auc_value
  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.6845
## The AUC value for class CN is: 0.6845025 
## 
## Class: Dementia 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.6235
## The AUC value for class Dementia is: 0.6234848 
## 
## Class: MCI 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.6347
## The AUC value for class MCI is: 0.634698

if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    modelTrain_rf_AUC <- mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.6475618
print(modelTrain_rf_AUC)
## [1] 0.6475618

6. SVM

6.1 SVM Model Training

df_SVM<-processed_data 
featureName_SVM1<-AfterProcess_FeatureName
trainIndex <- createDataPartition(df_SVM$DX, p = 0.7, list = FALSE)
train_data_SVM1 <- df_SVM[trainIndex, ]
test_data_SVM1 <- df_SVM[-trainIndex, ]

X_train_SVM1 <- subset(train_data_SVM1,select = -DX)
y_train_SVM1 <- train_data_SVM1$DX
X_test_SVM1 <- subset(test_data_SVM1, select= -DX )
y_test_SVM1 <- test_data_SVM1$DX
train_control <- trainControl(method = "cv", number = 5, classProbs = TRUE)

svm_model <- caret::train(DX ~ ., data = train_data_SVM1,
                   method = "svmRadial",
                   trControl = train_control)
print(svm_model)
## Support Vector Machines with Radial Basis Function Kernel 
## 
## 455 samples
## 155 predictors
##   3 classes: 'CN', 'Dementia', 'MCI' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 364, 364, 364, 365, 363 
## Resampling results across tuning parameters:
## 
##   C     Accuracy   Kappa    
##   0.25  0.7142963  0.5300926
##   0.50  0.7142719  0.5287505
##   1.00  0.7119525  0.5147656
## 
## Tuning parameter 'sigma' was held constant at a value of 0.003301995
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.003301995 and C = 0.25.
print(svm_model$bestTune)
##         sigma    C
## 1 0.003301995 0.25
mean_accuracy_svm_model<- mean(svm_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_svm_model)
## [1] 0.7135069
modelTrain_mean_accuracy_cv_svm <- mean_accuracy_svm_model
print(modelTrain_mean_accuracy_cv_svm)
## [1] 0.7135069
train_predictions <- predict(svm_model, newdata = train_data_SVM1)

train_accuracy <- mean(train_predictions == train_data_SVM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  0.938461538461538"
modelTrain_svm_trainAccuracy <-train_accuracy
print(modelTrain_svm_trainAccuracy)
## [1] 0.9384615
predictions <- predict(svm_model, newdata = test_data_SVM1)
cm_modelTrain_svm <- caret::confusionMatrix(predictions,test_data_SVM1$DX)
print(cm_modelTrain_svm)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia MCI
##   CN       51        5  31
##   Dementia  2       18   9
##   MCI      13        5  59
## 
## Overall Statistics
##                                           
##                Accuracy : 0.6632          
##                  95% CI : (0.5918, 0.7295)
##     No Information Rate : 0.513           
##     P-Value [Acc > NIR] : 1.708e-05       
##                                           
##                   Kappa : 0.4563          
##                                           
##  Mcnemar's Test P-Value : 0.02042         
## 
## Statistics by Class:
## 
##                      Class: CN Class: Dementia Class: MCI
## Sensitivity             0.7727         0.64286     0.5960
## Specificity             0.7165         0.93333     0.8085
## Pos Pred Value          0.5862         0.62069     0.7662
## Neg Pred Value          0.8585         0.93902     0.6552
## Prevalence              0.3420         0.14508     0.5130
## Detection Rate          0.2642         0.09326     0.3057
## Detection Prevalence    0.4508         0.15026     0.3990
## Balanced Accuracy       0.7446         0.78810     0.7022
cm_modelTrain_svm_Accuracy <- cm_modelTrain_svm$overall["Accuracy"]
cm_modelTrain_svm_Kappa <- cm_modelTrain_svm$overall["Kappa"]
print(cm_modelTrain_svm_Accuracy)
##  Accuracy 
## 0.6632124
print(cm_modelTrain_svm_Kappa)
##     Kappa 
## 0.4562673

Let’s take a look of the feature importance of the model trained.

library(iml)

predictor_SVM <- Predictor$new(svm_model,data = df_SVM,y=df_SVM$DX)
importance_SVM <- FeatureImp$new(predictor_SVM,loss="ce")
print(importance_SVM)
## Interpretation method:  FeatureImp 
## error function: ce
## 
## Analysed predictor: 
## Prediction task: classification 
## Classes:  
## 
## Analysed data:
## Sampling from data.frame with 648 rows and 156 columns.
## 
## 
## Head of results:
##      feature importance.05 importance importance.95 permutation.error
## 1 cg25879395      1.047312   1.075269      1.094624         0.1543210
## 2    age.now      1.017204   1.053763      1.075269         0.1512346
## 3 cg00999469      1.032258   1.043011      1.051613         0.1496914
## 4 cg26069044      1.025806   1.043011      1.062366         0.1496914
## 5 cg05096415      1.021505   1.043011      1.062366         0.1496914
## 6 cg01921484      0.972043   1.043011      1.053763         0.1496914
plot(importance_SVM)

library(vip)

vip(svm_model, method = "permute", train = train_data_SVM1, target = "DX", nsim = 10, metric = "bal_accuracy", pred_wrapper = predict)

importance_SVM_df<-importance_SVM$results
if(METHOD_FEATURE_FLAG == 5){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")

  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc
  modelTrain_svm_AUC <- auc_value
  print(auc_value) 

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4|| METHOD_FEATURE_FLAG==6){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")

  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc
  modelTrain_svm_AUC <- auc_value
  print(auc_value) 

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 3){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")

  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc
  modelTrain_svm_AUC <- auc_value
  print(auc_value) 

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.5126
## The AUC value for class CN is: 0.5126461 
## 
## Class: Dementia 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.6271
## The AUC value for class Dementia is: 0.6270563 
## 
## Class: MCI 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.5456
## The AUC value for class MCI is: 0.545562

if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    modelTrain_svm_AUC <- mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.5617548

7. Important Features

7.0 Choose Number of Top Features

# GOTO "INPUT" Session to set the Number of common features needed

NUM_COMMON_FEATURES <- NUM_COMMON_FEATURES_SET

7.1 Merge Important Features

The feature importance may not combined directly, since they are not all within the same measure, for example, the SVM model is use other method for feature importance.

So, let’s considering scale the feature to make them in the same range.

First, Let’s process with each data frame to ensure they have consistent format.

if(METHOD_FEATURE_FLAG == 3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6){
  
# Process the dataframe to ensure they have consistent format.

# SVM
importance_SVM_df_processed<-importance_SVM_df[,c("importance","feature")]
colnames(importance_SVM_df_processed)[colnames(importance_SVM_df_processed) == "feature"] <- "Feature"
colnames(importance_SVM_df_processed)[colnames(importance_SVM_df_processed) == "importance"] <- "Importance_SVM"

head(importance_SVM_df_processed)

# LRM
importance_model_LRM1_df_processed<-importance_model_LRM1_df
importance_model_LRM1_df_processed$Feature<-rownames(importance_model_LRM1_df_processed)
colnames(importance_model_LRM1_df_processed)[colnames(importance_model_LRM1_df_processed) == "Overall"] <- "Importance_LRM1"

head(importance_model_LRM1_df_processed)

# Elastic Net
importance_elastic_net_model1_df_processed<-importance_elastic_net_model1_df
importance_elastic_net_model1_df_processed$Feature<-rownames(importance_elastic_net_model1_df_processed)
colnames(importance_elastic_net_model1_df_processed)[colnames(importance_elastic_net_model1_df_processed) == "Overall"] <- "Importance_ENM1"

head(importance_elastic_net_model1_df_processed)



# XGBoost
importance_xgb_model_df_processed<-importance_xgb_model_df
importance_xgb_model_df_processed$Feature<-rownames(importance_xgb_model_df_processed)
colnames(importance_xgb_model_df_processed)[colnames(importance_xgb_model_df_processed) == "Overall"] <- "Importance_XGB"

head(importance_xgb_model_df_processed)


# RF

importance_rf_model_df_processed <- importance_rf_model_df

if (METHOD_FEATURE_FLAG_NUM == 3){
  
  importance_rf_model_df_processed$Importance <- rowMeans(importance_rf_model_df_processed)
  
  importance_rf_model_df_processed$Feature <- rownames(importance_rf_model_df_processed)
  
  importance_rf_model_df_processed <- subset(importance_rf_model_df_processed, select = -c(CI, CN))
  
  colnames(importance_rf_model_df_processed)[colnames(importance_rf_model_df_processed) == "Importance"] <- "Importance_RF"


}

if (METHOD_FEATURE_FLAG_NUM == 4){
  
  importance_rf_model_df_processed$Importance <- rowMeans(importance_rf_model_df_processed)
  
  importance_rf_model_df_processed$Feature <- rownames(importance_rf_model_df_processed)
  
  importance_rf_model_df_processed <- subset(importance_rf_model_df_processed, select = -c(Dementia, CN))
  
  colnames(importance_rf_model_df_processed)[colnames(importance_rf_model_df_processed) == "Importance"] <- "Importance_RF"


}



if (METHOD_FEATURE_FLAG_NUM == 5){
  
  importance_rf_model_df_processed$Importance <- rowMeans(importance_rf_model_df_processed)
  
  importance_rf_model_df_processed$Feature <- rownames(importance_rf_model_df_processed)
  
  importance_rf_model_df_processed <- subset(importance_rf_model_df_processed, select = -c(MCI, CN))
  
  colnames(importance_rf_model_df_processed)[colnames(importance_rf_model_df_processed) == "Importance"] <- "Importance_RF"


}

if (METHOD_FEATURE_FLAG_NUM == 6){
  
  importance_rf_model_df_processed$Importance <- rowMeans(importance_rf_model_df_processed)
  
  importance_rf_model_df_processed$Feature <- rownames(importance_rf_model_df_processed)
  
  importance_rf_model_df_processed <- subset(importance_rf_model_df_processed, select = -c(MCI, Dementia))
  
  colnames(importance_rf_model_df_processed)[colnames(importance_rf_model_df_processed) == "Importance"] <- "Importance_RF"


}


head(importance_rf_model_df_processed)


}

From above (binary case), we could ensure they have same data frame structure with same column names, ‘Importance’ and ‘feature’ in order.

If our case is the multiclass classification, see the below. Except XGBoost model and SVM model, the features importance of each model are computed by the max importance among the classes.

if(METHOD_FEATURE_FLAG == 1){
  
# Process the dataframe to ensure they have consistent format.

# SVM
importance_SVM_df_processed<-importance_SVM_df[,c("importance","feature")]
colnames(importance_SVM_df_processed)[colnames(importance_SVM_df_processed) == "feature"] <- "Feature"
colnames(importance_SVM_df_processed)[colnames(importance_SVM_df_processed) == "importance"] <- "Importance_SVM"

head(importance_SVM_df_processed)

# LRM
importance_model_LRM1_df_processed<-importance_model_LRM1_df
colnames(importance_model_LRM1_df_processed)[colnames(importance_model_LRM1_df_processed) == "MaxImportance"] <- "Importance_LRM1"
importance_model_LRM1_df_processed <- subset(importance_model_LRM1_df_processed, select = -c(Dementia,MCI, CN))
head(importance_model_LRM1_df_processed)

# Elastic Net
importance_elastic_net_model1_df_processed<-importance_elastic_net_model1_df
importance_elastic_net_model1_df_processed <- subset(importance_elastic_net_model1_df_processed, select = -c(Dementia,MCI, CN))

colnames(importance_elastic_net_model1_df_processed)[colnames(importance_elastic_net_model1_df_processed) == "MaxImportance"] <- "Importance_ENM1"

head(importance_elastic_net_model1_df_processed)



# XGBoost
importance_xgb_model_df_processed<-importance_xgb_model_df
importance_xgb_model_df_processed$Feature<-rownames(importance_xgb_model_df_processed)

colnames(importance_xgb_model_df_processed)[colnames(importance_xgb_model_df_processed) == "Overall"] <- "Importance_XGB"


head(importance_xgb_model_df_processed)


# RF

importance_rf_model_df_processed <- importance_rf_model_df
  
importance_rf_model_df_processed <- subset(importance_rf_model_df_processed, select = -c(Dementia,MCI, CN))
  
colnames(importance_rf_model_df_processed)[colnames(importance_rf_model_df_processed) == "MaxImportance"] <- "Importance_RF"

head(importance_rf_model_df_processed)

}

Then, Let’s do scaling, here we choose min-max scaling.

importance_list <- list(logistic = importance_model_LRM1_df_processed, 
                        xgb = importance_xgb_model_df_processed, 
                        elastic_net = importance_elastic_net_model1_df_processed, 
                        rf = importance_rf_model_df_processed, 
                        svm = importance_SVM_df_processed)


min_max_scale_Imp<-function(df){
  x<-df[, grepl("Importance_", colnames(df))]
  df[, grepl("Importance_", colnames(df))] <- (x - min(x)) / (max(x) - min(x))
  return(df)
}

for (i in seq_along(importance_list)) {
    importance_list[[i]] <- min_max_scale_Imp(importance_list[[i]])
}


# Print each data frame after scaling
print(head(importance_list[[1]]))
##      Feature Importance_LRM1
## 1        PC1       1.0000000
## 2        PC2       0.7857178
## 3        PC3       0.6786379
## 4 cg00962106       0.6281880
## 5 cg02225060       0.5084410
## 6 cg14710850       0.4928898
print(head(importance_list[[2]]))
##            Importance_XGB    Feature
## age.now         1.0000000    age.now
## cg05096415      0.5867398 cg05096415
## cg15501526      0.5396750 cg15501526
## cg00962106      0.5279227 cg00962106
## cg16652920      0.5184487 cg16652920
## cg14564293      0.5039124 cg14564293
print(head(importance_list[[3]]))
##      Feature Importance_ENM1
## 1        PC1       1.0000000
## 2        PC2       0.8852268
## 3 cg00962106       0.7275966
## 4 cg02225060       0.6173652
## 5 cg02981548       0.5869110
## 6 cg23432430       0.5696065
print(head(importance_list[[4]]))
##      Feature Importance_RF
## 1 cg15501526     1.0000000
## 2 cg01153376     0.7283689
## 3 cg08857872     0.5601036
## 4 cg12279734     0.5411165
## 5 cg06864789     0.4696092
## 6 cg00962106     0.4661381
print(head(importance_list[[5]]))
##   Importance_SVM    Feature
## 1      1.0000000 cg25879395
## 2      0.8333333    age.now
## 3      0.7500000 cg00999469
## 4      0.7500000 cg26069044
## 5      0.7500000 cg05096415
## 6      0.7500000 cg01921484

Now, Let’s merge the data frames of scaled feature importance.

# Merge all importances
combined_importance <- Reduce(function(x, y) merge(x, y, by = "Feature", all = TRUE), importance_list)

head(combined_importance)
# Replace NA with 0
combined_importance[is.na(combined_importance)] <- 0

# Exclude DX, as it's label

combined_importance <- combined_importance %>% 
  filter(Feature != "DX")

# View the filtered dataframe
head(combined_importance)

7.2 View the Important Features

7.2.1 Select Based on AVG

If select the TOP Number of important features based on average importance. (See the following)

combined_importance_AVF <- combined_importance
# Calculate average importance
combined_importance_AVF$Average_Importance <- rowMeans(combined_importance_AVF[,-1])

head(combined_importance_AVF)
combined_importance_Avg_ordered <- combined_importance_AVF[order(-combined_importance_AVF$Average_Importance),]

head(combined_importance_Avg_ordered)
# Top Number of common important features

print("the Top number of common features here is set to:")
## [1] "the Top number of common features here is set to:"
print(NUM_COMMON_FEATURES)
## [1] 20
top_Num_combined_importance_Avg_ordered <- head(combined_importance_Avg_ordered,n = NUM_COMMON_FEATURES)
print(top_Num_combined_importance_Avg_ordered)
##        Feature Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM
## 153        PC1      1.00000000      0.1866056       1.0000000     0.1240874      0.6666667
## 10  cg00962106      0.62818798      0.5279227       0.7275966     0.4661381      0.2500000
## 154        PC2      0.78571784      0.2487008       0.8852268     0.2192495      0.4166667
## 39  cg05096415      0.44533457      0.5867398       0.4148286     0.2849571      0.7500000
## 60  cg08857872      0.38088074      0.4675933       0.5311399     0.5601036      0.4166667
## 129 cg23432430      0.43874902      0.2739420       0.5696065     0.3185561      0.7500000
## 102 cg16652920      0.34495856      0.5184487       0.5211088     0.4472525      0.5000000
## 50  cg06864789      0.36409272      0.5038359       0.4605866     0.4696092      0.5000000
## 1      age.now      0.00000000      1.0000000       0.0000000     0.4537216      0.8333333
## 19  cg01921484      0.23359943      0.4484551       0.3846605     0.4169069      0.7500000
## 146 cg26948066      0.32904957      0.3759000       0.5078325     0.2620902      0.7500000
## 107 cg17186592      0.41619685      0.3415588       0.4285680     0.3530728      0.6666667
## 62  cg09584650      0.41042125      0.4571129       0.4770672     0.2305017      0.5833333
## 78  cg12279734      0.30333382      0.3066646       0.3294303     0.5411165      0.6666667
## 28  cg02981548      0.48692573      0.4094430       0.5869110     0.2257793      0.4166667
## 93  cg14710850      0.49288980      0.2709836       0.5415595     0.3530923      0.4166667
## 155        PC3      0.67863786      0.1650015       0.5045821     0.2153460      0.5000000
## 54  cg07152869      0.46401454      0.2123634       0.5392183     0.1710842      0.6666667
## 61  cg08861434      0.48302924      0.2358769       0.4931322     0.1426766      0.6666667
## 95  cg15501526      0.05232237      0.5396750       0.1653714     1.0000000      0.2500000
##     Average_Importance
## 153          0.5954719
## 10           0.5199691
## 154          0.5111123
## 39           0.4963720
## 60           0.4712769
## 129          0.4701707
## 102          0.4663537
## 50           0.4596249
## 1            0.4574110
## 19           0.4467244
## 146          0.4449745
## 107          0.4412126
## 62           0.4316873
## 78           0.4294424
## 28           0.4251451
## 93           0.4150384
## 155          0.4127135
## 54           0.4106694
## 61           0.4042763
## 95           0.4014738
# Top Number of common important features' name

top_Num_combined_importance_Avg_ordered_Nam <- top_Num_combined_importance_Avg_ordered$Feature

print(top_Num_combined_importance_Avg_ordered_Nam)
##  [1] "PC1"        "cg00962106" "PC2"        "cg05096415" "cg08857872" "cg23432430" "cg16652920"
##  [8] "cg06864789" "age.now"    "cg01921484" "cg26948066" "cg17186592" "cg09584650" "cg12279734"
## [15] "cg02981548" "cg14710850" "PC3"        "cg07152869" "cg08861434" "cg15501526"

Visualization with bar plot for the feature average importance

ggplot(combined_importance_Avg_ordered, aes(x = reorder(Feature, Average_Importance), y = Average_Importance)) +
  geom_bar(stat = "identity") +
  coord_flip() +  # Flip coordinates to make it horizontal
  labs(title = "Feature Importance Sorted by Average Value",
       x = "Feature",
       y = "Average Importance") +
  theme_minimal()

Visualization with bar plot for the top feature average importance

ggplot(top_Num_combined_importance_Avg_ordered, aes(x = reorder(Feature, Average_Importance), y = Average_Importance)) +
  geom_bar(stat = "identity") +
  coord_flip() + 
  labs(title = paste("Top",NUM_COMMON_FEATURES,"Feature Importance Sorted by Average Value"),
       x = "Feature",
       y = "Average Importance") +
  theme_minimal()

7.2.2 Select Based on Quantile

The following will show, If we select the TOP Number of important features based on specific quantile importance. ( Here we choose to use median i.e 50% quantile)

Let’s create the new data frame with different quantiles of feature importance for each models.

And order by the 50% quantile from high to low, select top features based on that.

quantiles <- t(apply(combined_importance[,-1], 1, function(x) quantile(x, probs = c(0,0.25, 0.5, 0.75,1))))

combined_importance_quantiles <- cbind(Feature = combined_importance$Feature, quantiles)

combined_importance_quantiles <- as.data.frame(combined_importance_quantiles)
combined_importance_quantiles$`50%` <- as.numeric(combined_importance_quantiles$`50%`)
combined_importance_quantiles$`0%` <- as.numeric(combined_importance_quantiles$`0%`)

combined_importance_quantiles$`25%` <- as.numeric(combined_importance_quantiles$`25%`)

combined_importance_quantiles$`75%` <- as.numeric(combined_importance_quantiles$`75%`)

combined_importance_quantiles$`100%` <- as.numeric(combined_importance_quantiles$`100%`)

# Sort by median importance (50th percentile)
combined_importance_quantiles <- combined_importance_quantiles[order(-combined_importance_quantiles$`50%`), ]


head(combined_importance_quantiles)
top_Num_median_features_imp <- head(combined_importance_quantiles,n = NUM_COMMON_FEATURES)
print(top_Num_median_features_imp)
##        Feature         0%        25%       50%       75%      100%
## 153        PC1 0.12408737 0.18660559 0.6666667 1.0000000 1.0000000
## 10  cg00962106 0.25000000 0.46613808 0.5279227 0.6281880 0.7275966
## 102 cg16652920 0.34495856 0.44725248 0.5000000 0.5184487 0.5211088
## 155        PC3 0.16500148 0.21534602 0.5000000 0.5045821 0.6786379
## 150 cg27452255 0.00000000 0.18098580 0.4871655 0.4910563 0.5000000
## 61  cg08861434 0.14267663 0.23587688 0.4830292 0.4931322 0.6666667
## 50  cg06864789 0.36409272 0.46058663 0.4696092 0.5000000 0.5038359
## 60  cg08857872 0.38088074 0.41666667 0.4675933 0.5311399 0.5601036
## 54  cg07152869 0.17108416 0.21236341 0.4640145 0.5392183 0.6666667
## 62  cg09584650 0.23050169 0.41042125 0.4571129 0.4770672 0.5833333
## 104 cg16749614 0.03199478 0.08835466 0.4560318 0.5406656 0.5833333
## 1      age.now 0.00000000 0.00000000 0.4537216 0.8333333 1.0000000
## 39  cg05096415 0.28495711 0.41482863 0.4453346 0.5867398 0.7500000
## 129 cg23432430 0.27394198 0.31855613 0.4387490 0.5696065 0.7500000
## 19  cg01921484 0.23359943 0.38466055 0.4169069 0.4484551 0.7500000
## 21  cg02225060 0.18921852 0.23091637 0.4166667 0.5084410 0.6173652
## 28  cg02981548 0.22577927 0.40944300 0.4166667 0.4869257 0.5869110
## 93  cg14710850 0.27098363 0.35309226 0.4166667 0.4928898 0.5415595
## 116 cg19503462 0.06025222 0.19011324 0.4166667 0.4682402 0.4778176
## 154        PC2 0.21924948 0.24870077 0.4166667 0.7857178 0.8852268
top_Num_median_features_Name<-top_Num_median_features_imp$Feature
print(top_Num_median_features_Name)
##  [1] "PC1"        "cg00962106" "cg16652920" "PC3"        "cg27452255" "cg08861434" "cg06864789"
##  [8] "cg08857872" "cg07152869" "cg09584650" "cg16749614" "age.now"    "cg05096415" "cg23432430"
## [15] "cg01921484" "cg02225060" "cg02981548" "cg14710850" "cg19503462" "PC2"

Visualization with the box plot.

library(tidyr)

long_df <- pivot_longer(combined_importance_quantiles, 
                        cols = c(`0%`, `25%`, `50%`, `75%`, `100%`),
                        names_to = "Quantile",
                        values_to = "Importance")

ggplot(long_df, aes(x = reorder(Feature, Importance), y = Importance)) +
  geom_boxplot() +
  coord_flip() +  
  labs(title = "Distribution of Feature Importances",
       x = "Feature",
       y = "Importance") +
  theme_minimal()


Visualization with top features with box plot.

library(tidyr)

long_df <- pivot_longer(top_Num_median_features_imp, 
                        cols = c(`0%`, `25%`, `50%`, `75%`, `100%`),
                        names_to = "Quantile",
                        values_to = "Importance")

ggplot(long_df, aes(x = reorder(Feature, Importance), y = Importance)) +
  geom_boxplot() +
  coord_flip() +
  labs(
    title = paste("Distribution of Top",NUM_COMMON_FEATURES,"Feature Importance Sorted by Median Value"),
       x = "Feature",
       y = "Importance") +
  theme_minimal()

7.2.3 Select Based on Frequency/Common

The frequency / common feature importance is processed in the following:

  1. Select the TOP Number of features (say 40) for each model (This number is set to “NUM_COMMON_FEATURES_SET_Frequency” in the INPUT session )
  2. Calculated the frequency of the appearance of each features based on the Top Number of features selected from step1.
  3. For each features that appear greater or equal than half time, we consider it’s important and collect these important features as common features.
n_select_frequencyWay <- NUM_COMMON_FEATURES_SET_Frequency
combined_importance_freq_ordered_df<-combined_importance_Avg_ordered
# LRM
## All_impAvg_orderby_LRM
All_impAvg_orderby_LRM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_LRM1),]
## top_impAvg_orderby_LRM
top_impAvg_orderby_LRM <- head(All_impAvg_orderby_LRM,n = n_select_frequencyWay)
top_impAvg_orderby_LRM_NAME <- top_impAvg_orderby_LRM$Feature

# XGB
## All_impAvg_orderby_XGB
All_impAvg_orderby_XGB <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_XGB),]
## top_impAvg_orderby_XGB
top_impAvg_orderby_XGB <- head(All_impAvg_orderby_XGB,n = n_select_frequencyWay)
top_impAvg_orderby_XGB_NAME <- top_impAvg_orderby_XGB$Feature


# ENM
## all_impAvg_orderby_ENM
All_impAvg_orderby_ENM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_ENM1),]
## top_impAvg_orderby_ENM
top_impAvg_orderby_ENM <- head(All_impAvg_orderby_ENM,n = n_select_frequencyWay)
top_impAvg_orderby_ENM_NAME <- top_impAvg_orderby_ENM$Feature


# RF
## all_impAvg_orderby_RF
All_impAvg_orderby_RF <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_RF),]
## top_impAvg_orderby_RF
top_impAvg_orderby_RF <- head(All_impAvg_orderby_RF,n = n_select_frequencyWay)
top_impAvg_orderby_RF_NAME <- top_impAvg_orderby_RF$Feature


# SVM
## all_impAvg_orderby_SVM
All_impAvg_orderby_SVM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_SVM),]
## top_impAvg_orderby_SVM
top_impAvg_orderby_SVM <- head(All_impAvg_orderby_SVM,n = n_select_frequencyWay)
top_impAvg_orderby_SVM_NAME <- top_impAvg_orderby_SVM$Feature
# Combine all features into a unique collection
all_features <- unique(c(top_impAvg_orderby_LRM_NAME, top_impAvg_orderby_XGB_NAME, top_impAvg_orderby_ENM_NAME,top_impAvg_orderby_RF_NAME,top_impAvg_orderby_SVM_NAME))

models<-c("LRM","XGB","ENM","RF","SVM")
feature_matrix <- matrix(0, nrow = length(all_features), ncol = length(models), 
                         dimnames = list(all_features, models))

# Fill the dataframe indicating presence (1) or absence (0) of each feature in each model
for (feature in all_features) {
  feature_matrix[feature, "LRM"] <- 
    as.integer(feature %in% top_impAvg_orderby_LRM_NAME)
  feature_matrix[feature, "XGB"] <- 
    as.integer(feature %in% top_impAvg_orderby_XGB_NAME)
  feature_matrix[feature, "ENM"] <- 
    as.integer(feature %in% top_impAvg_orderby_ENM_NAME)
  feature_matrix[feature, "RF"] <- 
    as.integer(feature %in% top_impAvg_orderby_RF_NAME)
  feature_matrix[feature, "SVM"] <- 
    as.integer(feature %in% top_impAvg_orderby_SVM_NAME)
}

feature_df <- as.data.frame(feature_matrix)

print(head(feature_df))
##            LRM XGB ENM RF SVM
## PC1          1   0   1  0   1
## PC2          1   0   1  0   0
## PC3          1   0   1  0   0
## cg00962106   1   1   1  1   0
## cg02225060   1   0   1  0   0
## cg14710850   1   0   1  1   0

For quickly read, we calculate the time that the feature have been appeared, by calculated row sum and add the row sum column into our data frame.

feature_df$Total_Count <- rowSums(feature_df[,1:5])
feature_df <- feature_df[order(-feature_df$Total_Count), ]
frequency_feature_df_RAW_ordered<-feature_df
print(feature_df)
##            LRM XGB ENM RF SVM Total_Count
## cg00962106   1   1   1  1   0           4
## PC1          1   0   1  0   1           3
## cg14710850   1   0   1  1   0           3
## cg02981548   1   1   1  0   0           3
## cg08861434   1   0   1  0   1           3
## cg07152869   1   0   1  0   1           3
## cg05096415   1   1   0  0   1           3
## cg23432430   1   0   1  0   1           3
## cg17186592   1   0   0  1   1           3
## cg09584650   1   1   1  0   0           3
## age.now      0   1   0  1   1           3
## cg16652920   0   1   1  1   0           3
## cg06864789   0   1   1  1   0           3
## cg08857872   0   1   1  1   0           3
## cg01921484   0   1   0  1   1           3
## cg26948066   0   1   1  0   1           3
## PC2          1   0   1  0   0           2
## PC3          1   0   1  0   0           2
## cg02225060   1   0   1  0   0           2
## cg27452255   1   0   1  0   0           2
## cg19503462   1   0   1  0   0           2
## cg16749614   1   0   1  0   0           2
## cg11133939   1   0   1  0   0           2
## cg15501526   0   1   0  1   0           2
## cg25259265   0   1   0  1   0           2
## cg01128042   0   1   0  0   1           2
## cg02494911   0   1   0  1   0           2
## cg12279734   0   0   0  1   1           2
## cg00247094   1   0   0  0   0           1
## cg16715186   1   0   0  0   0           1
## cg03129555   1   0   0  0   0           1
## cg14564293   0   1   0  0   0           1
## cg04412904   0   1   0  0   0           1
## cg16771215   0   1   0  0   0           1
## cg02621446   0   1   0  0   0           1
## cg15865722   0   1   0  0   0           1
## cg03327352   0   1   0  0   0           1
## cg02372404   0   0   1  0   0           1
## cg01153376   0   0   0  1   0           1
## cg23658987   0   0   0  1   0           1
## cg14293999   0   0   0  1   0           1
## cg05570109   0   0   0  1   0           1
## cg21209485   0   0   0  1   0           1
## cg16579946   0   0   0  1   0           1
## cg14924512   0   0   0  1   0           1
## cg07523188   0   0   0  1   0           1
## cg25879395   0   0   0  0   1           1
## cg26757229   0   0   0  0   1           1
## cg26069044   0   0   0  0   1           1
## cg00999469   0   0   0  0   1           1
## cg24861747   0   0   0  0   1           1
## cg01013522   0   0   0  0   1           1
## cg05234269   0   0   0  0   1           1
## cg00616572   0   0   0  0   1           1
## cg01680303   0   0   0  0   1           1

Combine with the importance data frame

all_features <- union(combined_importance_freq_ordered_df$Feature, rownames(feature_df))
# please note that the combined we use is the one before filtering
# Combine then based on common feature selection method
# if the feature in previous importance feature is not here, then we add the feature and make the value to zero.
feature_df_full <- data.frame(Feature = all_features)
feature_df_full <- merge(feature_df_full, feature_df, by.x = "Feature", by.y = "row.names", all.x = TRUE)
feature_df_full[is.na(feature_df_full)] <- 0


# For top_impAvg_ordered
all_impAvg_ordered_full <- data.frame(Feature = all_features)
all_impAvg_ordered_full <- merge(combined_importance_freq_ordered_df,all_impAvg_ordered_full, by.x = "Feature", by.y = "Feature", all.x = TRUE)
all_impAvg_ordered_full[is.na(all_impAvg_ordered_full)] <- 0
all_combined_df_impAvg <- merge(feature_df_full, all_impAvg_ordered_full, by = "Feature", all = TRUE)

print(head(feature_df_full))
##      Feature LRM XGB ENM RF SVM Total_Count
## 1    age.now   0   1   0  1   1           3
## 2 cg00154902   0   0   0  0   0           0
## 3 cg00247094   1   0   0  0   0           1
## 4 cg00272795   0   0   0  0   0           0
## 5 cg00322003   0   0   0  0   0           0
## 6 cg00616572   0   0   0  0   1           1
print(head(all_impAvg_ordered_full))
##      Feature Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM
## 1    age.now      0.00000000      1.0000000       0.0000000    0.45372158      0.8333333
## 2 cg00154902      0.08879263      0.2688349       0.3713159    0.33502754      0.5833333
## 3 cg00247094      0.41278095      0.2185408       0.4245031    0.23013585      0.5833333
## 4 cg00272795      0.21295491      0.1985510       0.2309999    0.09024509      0.3333333
## 5 cg00322003      0.21752832      0.1465702       0.3430531    0.27821774      0.5833333
## 6 cg00616572      0.28381319      0.1715595       0.3572845    0.17891065      0.6666667
##   Average_Importance
## 1          0.4574110
## 2          0.3294609
## 3          0.3738588
## 4          0.2132168
## 5          0.3137405
## 6          0.3316469
print(head(all_combined_df_impAvg))
##      Feature LRM XGB ENM RF SVM Total_Count Importance_LRM1 Importance_XGB Importance_ENM1
## 1    age.now   0   1   0  1   1           3      0.00000000      1.0000000       0.0000000
## 2 cg00154902   0   0   0  0   0           0      0.08879263      0.2688349       0.3713159
## 3 cg00247094   1   0   0  0   0           1      0.41278095      0.2185408       0.4245031
## 4 cg00272795   0   0   0  0   0           0      0.21295491      0.1985510       0.2309999
## 5 cg00322003   0   0   0  0   0           0      0.21752832      0.1465702       0.3430531
## 6 cg00616572   0   0   0  0   1           1      0.28381319      0.1715595       0.3572845
##   Importance_RF Importance_SVM Average_Importance
## 1    0.45372158      0.8333333          0.4574110
## 2    0.33502754      0.5833333          0.3294609
## 3    0.23013585      0.5833333          0.3738588
## 4    0.09024509      0.3333333          0.2132168
## 5    0.27821774      0.5833333          0.3137405
## 6    0.17891065      0.6666667          0.3316469

Frequency Feature Selection

choose the mutual importance feature when it exist at least half number of model’s (i.e 3 in our case) top selected number of important features list.

if(METHOD_FEATURE_FLAG == 3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG == 5 || METHOD_FEATURE_FLAG==6){
df_process_mutual_FeatureName <- rownames(feature_df[feature_df$Total_Count>=3,])
df_process_mutual<-processed_data[,c("DX",df_process_mutual_FeatureName)]

print(paste("The number of final used features of common importance method:", length(df_process_mutual) - 1 ))
}
if(METHOD_FEATURE_FLAG == 1){
df_process_mutual_FeatureName <- rownames(feature_df[feature_df$Total_Count>=3,])
df_process_mutual<-processed_data_m1[,c("DX",df_process_mutual_FeatureName)]

print(paste("The number of final used features of common importance method:", length(df_process_mutual) - 1 ))
}
## [1] "The number of final used features of common importance method: 16"
print(df_process_mutual_FeatureName)
##  [1] "cg00962106" "PC1"        "cg14710850" "cg02981548" "cg08861434" "cg07152869" "cg05096415"
##  [8] "cg23432430" "cg17186592" "cg09584650" "age.now"    "cg16652920" "cg06864789" "cg08857872"
## [15] "cg01921484" "cg26948066"

Importance of these features:

Top_Frequency_Feature_importance <- combined_importance_freq_ordered_df[
    combined_importance_freq_ordered_df$Feature %in% df_process_mutual_FeatureName,
]

print(Top_Frequency_Feature_importance)
##        Feature Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM
## 153        PC1       1.0000000      0.1866056       1.0000000     0.1240874      0.6666667
## 10  cg00962106       0.6281880      0.5279227       0.7275966     0.4661381      0.2500000
## 39  cg05096415       0.4453346      0.5867398       0.4148286     0.2849571      0.7500000
## 60  cg08857872       0.3808807      0.4675933       0.5311399     0.5601036      0.4166667
## 129 cg23432430       0.4387490      0.2739420       0.5696065     0.3185561      0.7500000
## 102 cg16652920       0.3449586      0.5184487       0.5211088     0.4472525      0.5000000
## 50  cg06864789       0.3640927      0.5038359       0.4605866     0.4696092      0.5000000
## 1      age.now       0.0000000      1.0000000       0.0000000     0.4537216      0.8333333
## 19  cg01921484       0.2335994      0.4484551       0.3846605     0.4169069      0.7500000
## 146 cg26948066       0.3290496      0.3759000       0.5078325     0.2620902      0.7500000
## 107 cg17186592       0.4161969      0.3415588       0.4285680     0.3530728      0.6666667
## 62  cg09584650       0.4104212      0.4571129       0.4770672     0.2305017      0.5833333
## 28  cg02981548       0.4869257      0.4094430       0.5869110     0.2257793      0.4166667
## 93  cg14710850       0.4928898      0.2709836       0.5415595     0.3530923      0.4166667
## 54  cg07152869       0.4640145      0.2123634       0.5392183     0.1710842      0.6666667
## 61  cg08861434       0.4830292      0.2358769       0.4931322     0.1426766      0.6666667
##     Average_Importance
## 153          0.5954719
## 10           0.5199691
## 39           0.4963720
## 60           0.4712769
## 129          0.4701707
## 102          0.4663537
## 50           0.4596249
## 1            0.4574110
## 19           0.4467244
## 146          0.4449745
## 107          0.4412126
## 62           0.4316873
## 28           0.4251451
## 93           0.4150384
## 54           0.4106694
## 61           0.4042763
ggplot(Top_Frequency_Feature_importance, aes(x = reorder(Feature, Average_Importance), y = Average_Importance)) +
  geom_bar(stat = "identity") +
  coord_flip() + 
  labs(title = "Feature Importance Selected Based on Frequncy Way and Sorted by Average Value",
       x = "Feature",
       y = "Average Importance") +
  theme_minimal()

Important feature based on frequency but not in Average

# This is to check if all elements inside Mutual method is in Mean method, and print out the features that not in Mean method

all(df_process_mutual_FeatureName %in% top_Num_combined_importance_Avg_ordered_Nam)
## [1] TRUE
Mutual_not_in_Mean <- setdiff(df_process_mutual_FeatureName, top_Num_combined_importance_Avg_ordered_Nam)
print(Mutual_not_in_Mean)
## character(0)

SAVE AS RDATA -MAY NOT NEEDED

Overview of the Data Frame Variables.

Phenotype Part Data frame : “phenoticPart_RAW”

RAW Merged Data frame : “merged_df_raw”

Ordered Feature Importance Based on quantile Data Frame: “combined_importance_quantiles”

Ordered Feature Importance Based on Mean Data Frame: “combined_importance_Avg_ordered”

Ordered Feature Frequency / Common Data Frame:

  • “frequency_feature_df_RAW_ordered” This is selected features’ frequency ordered by Total count of frequency.

  • “feature_df_full” This is frequency of all features based on our Steps of Frequency Method, and it’s not ordered.

  • “all_combined_df_impAvg” This is combined table of frequency and feature importance, it’s not ordered.

head(phenoticPart_RAW)
# 
# save(NUM_COMMON_FEATURES,
#      combined_importance_quantiles,
#      combined_importance_Avg_ordered,
#      frequency_feature_df_RAW_ordered,
#      top_Num_median_features_Name,
#      top_Num_combined_importance_Avg_ordered_Nam,
#      file = "Part2_V8_08_top_features_5KCpGs.RData")
# 
# save(processed_data_m3,processed_data_m3_df,AfterProcess_FeatureName_m3,file = "Part2_V8_08_BinaryMerged_5KCpGs.RData")
# 
# save(phenoticPart_RAW, merged_df_raw, file = "PhenotypeAndMerged.RData")

8. Feature Selection and Output

8.0 Input - Number of Top Features and Method Choose.

The feature selection method :

  1. based on mean feature importance ( set “INPUT_Method_Mean_Choose = TRUE” )
  2. based on median quantile feature importance ( set “INPUT_Method_Median_Choose = TRUE” )
  3. based on feature frequency importance. ( set “INPUT_Method_Frequency_Choose = TRUE” )
    • Comment: If use the feature frequency importance method, The Input number of features = N is used for the first step, select TOP N features for each model. In the end, may not exactly same as N features kept.
  4. Set Input method flag to FALSE will not generate the data based that method. If we want output all data based on each method, set all flag to TRUE. In summary, set the corresponding flag to TRUE, we will output the data set selected based on that corresponding method.
Number_fea_input <- INPUT_NUMBER_FEATURES

Flag_8mean <- INPUT_Method_Mean_Choose 
Flag_8median <- INPUT_Method_Median_Choose 
Flag_8Fequency <- INPUT_Method_Frequency_Choose 
print(paste("the Top number of features here is set to:", Number_fea_input))
## [1] "the Top number of features here is set to: 250"
Flag_8mean
## [1] TRUE
Flag_8median
## [1] TRUE
Flag_8Fequency
## [1] TRUE

8.1 Selected For Output

Based on Mean

selected_impAvg_ordered <- head(combined_importance_Avg_ordered,n = Number_fea_input)
print(head(selected_impAvg_ordered))
##        Feature Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM
## 153        PC1       1.0000000      0.1866056       1.0000000     0.1240874      0.6666667
## 10  cg00962106       0.6281880      0.5279227       0.7275966     0.4661381      0.2500000
## 154        PC2       0.7857178      0.2487008       0.8852268     0.2192495      0.4166667
## 39  cg05096415       0.4453346      0.5867398       0.4148286     0.2849571      0.7500000
## 60  cg08857872       0.3808807      0.4675933       0.5311399     0.5601036      0.4166667
## 129 cg23432430       0.4387490      0.2739420       0.5696065     0.3185561      0.7500000
##     Average_Importance
## 153          0.5954719
## 10           0.5199691
## 154          0.5111123
## 39           0.4963720
## 60           0.4712769
## 129          0.4701707
print(dim(selected_impAvg_ordered))
## [1] 155   7
selected_impAvg_ordered_NAME <- selected_impAvg_ordered$Feature

print(head(selected_impAvg_ordered_NAME))
## [1] "PC1"        "cg00962106" "PC2"        "cg05096415" "cg08857872" "cg23432430"
df_selected_Mean <- processed_dataFrame[,c("DX",selected_impAvg_ordered_NAME)]
print(head(df_selected_Mean))
##                           DX          PC1 cg00962106           PC2 cg05096415 cg08857872
## 200223270003_R02C01      MCI -0.214185447  0.9124898  1.470293e-02  0.9182527  0.3395280
## 200223270003_R03C01       CN -0.172761185  0.5375751  5.745834e-02  0.5177819  0.8181845
## 200223270003_R06C01       CN -0.003667305  0.5040948  8.372861e-02  0.6288426  0.2970779
## 200223270003_R07C01 Dementia -0.186779607  0.9039029 -1.117250e-02  0.6060271  0.2954090
## 200223270006_R01C01      MCI  0.026814649  0.8961556  1.650735e-05  0.5599588  0.8935876
## 200223270006_R04C01       CN -0.037862929  0.8857597  1.571950e-02  0.5441200  0.8901338
##                     cg23432430 cg16652920 cg06864789  age.now cg01921484 cg26948066 cg17186592
## 200223270003_R02C01  0.9482702  0.9436000 0.05369415 82.40000 0.90985496  0.4685225  0.9230463
## 200223270003_R03C01  0.9455418  0.9431222 0.46053125 78.60000 0.90931369  0.5026045  0.8593448
## 200223270003_R06C01  0.9418716  0.9457161 0.87513655 80.40000 0.92044873  0.9101976  0.8467599
## 200223270003_R07C01  0.9426559  0.9419785 0.49020327 78.16441 0.91674311  0.9379543  0.4986373
## 200223270006_R01C01  0.9461736  0.9529417 0.47852685 62.90000 0.02943747  0.9120181  0.8978999
## 200223270006_R04C01  0.9508404  0.9492648 0.05423587 80.67796 0.89057041  0.8868608  0.9239750
##                     cg09584650 cg12279734 cg02981548 cg14710850          PC3 cg07152869
## 200223270003_R02C01 0.08230254  0.6435368  0.1342571  0.8048592 -0.014043316  0.8284151
## 200223270003_R03C01 0.09661586  0.1494651  0.5220037  0.8090950  0.005055871  0.5050630
## 200223270003_R06C01 0.52399749  0.8760759  0.5098965  0.8285902  0.029143653  0.8352490
## 200223270003_R07C01 0.11587211  0.8674214  0.5660985  0.8336457 -0.032302430  0.5194300
## 200223270006_R01C01 0.42115185  0.6454450  0.5678714  0.8500725  0.052947950  0.5025709
## 200223270006_R04C01 0.56043178  0.8660058  0.5079859  0.8207247 -0.008685676  0.8080916
##                     cg08861434 cg15501526 cg25259265 cg02225060 cg24859648 cg11133939
## 200223270003_R02C01  0.8768306  0.6362531  0.4356646  0.6828159 0.83777536  0.1282694
## 200223270003_R03C01  0.4352647  0.6319253  0.8893591  0.8265195 0.44392797  0.5920898
## 200223270003_R06C01  0.8698813  0.7435100  0.4201700  0.5209552 0.03341185  0.5127706
## 200223270003_R07C01  0.4709249  0.7756577  0.4455517  0.8078889 0.43582347  0.8474176
## 200223270006_R01C01  0.8618532  0.3230777  0.8423337  0.6084903 0.03087161  0.8589133
## 200223270006_R04C01  0.9058965  0.8342695  0.8460736  0.7638781 0.02588024  0.5246557
##                     cg25879395 cg02621446 cg00247094 cg02494911 cg16771215 cg24861747
## 200223270003_R02C01 0.88130864  0.8731313  0.5399349  0.3049435 0.88389723  0.3540897
## 200223270003_R03C01 0.02603438  0.8095534  0.9315640  0.2416332 0.07196933  0.4309505
## 200223270003_R06C01 0.91060615  0.7511582  0.5177874  0.2520909 0.09949974  0.8071462
## 200223270003_R07C01 0.89205942  0.8773609  0.5377765  0.2457032 0.64234023  0.3347317
## 200223270006_R01C01 0.47886249  0.2046541  0.9109309  0.8045030 0.62679274  0.3544795
## 200223270006_R04C01 0.02145248  0.7963817  0.5266535  0.7489283 0.06970175  0.5997840
##                     cg01153376 cg04412904 cg20913114 cg01128042 cg10240127 cg14564293
## 200223270003_R02C01  0.4872148 0.05088595 0.36510482  0.9113420  0.9250553 0.52089591
## 200223270003_R03C01  0.9639670 0.07717659 0.80382984  0.5328806  0.9403255 0.04000662
## 200223270003_R06C01  0.2242410 0.08253743 0.03158439  0.5222757  0.9056974 0.04959460
## 200223270003_R07C01  0.5155654 0.06217431 0.81256840  0.5141721  0.9396217 0.03114773
## 200223270006_R01C01  0.9588916 0.11888769 0.81502059  0.9321215  0.9262370 0.51703196
## 200223270006_R04C01  0.9586876 0.08885846 0.90468830  0.5050081  0.9240497 0.51535010
##                     cg16749614 cg01013522 cg16579946 cg03129555 cg02372404 cg05234269
## 200223270003_R02C01  0.8678741  0.6251168  0.6306315  0.6079616 0.03598249 0.93848584
## 200223270003_R03C01  0.8539348  0.8862821  0.6648766  0.5785498 0.02767285 0.57461229
## 200223270003_R06C01  0.5874127  0.5425308  0.6455081  0.9137818 0.03127855 0.02467208
## 200223270003_R07C01  0.5555391  0.8429862  0.8979650  0.9043041 0.55685785 0.56516794
## 200223270006_R01C01  0.8026346  0.0480531  0.6886498  0.9286357 0.02587736 0.94829529
## 200223270006_R04C01  0.7903978  0.8240222  0.6766907  0.9088564 0.02828648 0.56298286
##                     cg12146221 cg12228670 cg14924512 cg27452255 cg16715186 cg00616572
## 200223270003_R02C01  0.2049284  0.8632174  0.5303907  0.9001010  0.2742789  0.9335067
## 200223270003_R03C01  0.1814927  0.8496212  0.9160885  0.6593379  0.7946153  0.9214079
## 200223270003_R06C01  0.8619250  0.8738949  0.9088414  0.9012217  0.8124316  0.9113633
## 200223270003_R07C01  0.1238469  0.8362189  0.9081681  0.8898635  0.7773263  0.9160238
## 200223270006_R01C01  0.2021598  0.8079694  0.9111789  0.5779792  0.8334531  0.4861334
## 200223270006_R04C01  0.1383786  0.6966666  0.5331753  0.8809143  0.8039945  0.9067928
##                     cg05570109 cg00154902 cg14293999 cg17421046 cg15775217 cg09854620
## 200223270003_R02C01  0.3466611  0.5137741  0.2836710  0.9026993  0.5707441  0.5220587
## 200223270003_R03C01  0.5866750  0.8540746  0.9172023  0.9112100  0.9168327  0.8739646
## 200223270003_R06C01  0.4046471  0.8188126  0.9168166  0.8952031  0.6042521  0.8973149
## 200223270003_R07C01  0.6014355  0.4625776  0.9188336  0.9268852  0.9062231  0.8958863
## 200223270006_R01C01  0.5774881  0.4690086  0.1971116  0.1118337  0.9083515  0.9075331
## 200223270006_R04C01  0.8756826  0.4547219  0.9030919  0.4174370  0.6383270  0.9318820
##                     cg19503462 cg26757229 cg06378561 cg01680303 cg06715136 cg15535896
## 200223270003_R02C01  0.7951675  0.6723726  0.9389306  0.5095174  0.3400192  0.3382952
## 200223270003_R03C01  0.4537684  0.1422661  0.9377503  0.1344941  0.9259109  0.9253926
## 200223270003_R06C01  0.6997359  0.7933794  0.5154019  0.7573869  0.9079807  0.3320191
## 200223270003_R07C01  0.7189778  0.8074830  0.9403569  0.4772204  0.6782105  0.9409104
## 200223270006_R01C01  0.7301755  0.5265692  0.4956816  0.1176263  0.8369052  0.9326027
## 200223270006_R04C01  0.4207207  0.7341953  0.9268832  0.5133033  0.8807568  0.9156401
##                     cg00322003 cg27341708 cg03084184 cg26219488 cg18339359 cg06697310
## 200223270003_R02C01  0.1759911 0.48846610  0.8162981  0.9336638  0.8824858  0.8454609
## 200223270003_R03C01  0.5702070 0.02613847  0.7877128  0.9134707  0.9040272  0.8653044
## 200223270003_R06C01  0.3077122 0.86893582  0.4546397  0.9261878  0.8552121  0.2405168
## 200223270003_R07C01  0.6104341 0.02642300  0.7812413  0.9217866  0.3073106  0.8479193
## 200223270006_R01C01  0.6147419 0.47573455  0.7818230  0.4929692  0.8973742  0.8206613
## 200223270006_R04C01  0.2293759 0.89411974  0.7725853  0.9431574  0.2292800  0.7839595
##                     cg10369879 cg10738648 cg06536614 cg26069044 cg20685672 cg03327352
## 200223270003_R02C01  0.9218784 0.44931577  0.5824474 0.92401867 0.67121006  0.8851712
## 200223270003_R03C01  0.3149306 0.49894016  0.5746694 0.94072227 0.79320906  0.8786878
## 200223270003_R06C01  0.9141081 0.05552024  0.5773468 0.93321315 0.66136456  0.3042310
## 200223270003_R07C01  0.9054415 0.03730440  0.5848917 0.56567694 0.80838304  0.8273211
## 200223270006_R01C01  0.2917862 0.54952781  0.5669919 0.94369927 0.08291414  0.8774082
## 200223270006_R04C01  0.9200403 0.59358167  0.5718514 0.02040391 0.84460055  0.8829492
##                     cg00999469 cg23658987 cg05841700 cg01667144 cg15865722 cg13885788
## 200223270003_R02C01  0.3274080 0.79757644  0.2923544  0.8971484 0.89438595  0.9380618
## 200223270003_R03C01  0.2857719 0.07511718  0.9146488  0.3175389 0.90194372  0.9369476
## 200223270003_R06C01  0.2499229 0.10177571  0.3737990  0.9238364 0.92118977  0.5163017
## 200223270003_R07C01  0.2819622 0.46747992  0.5046468  0.8739442 0.09230759  0.9183376
## 200223270006_R01C01  0.2933539 0.76831297  0.8419031  0.2931961 0.93422668  0.5525542
## 200223270006_R04C01  0.2966623 0.08988532  0.9286652  0.8616530 0.92220002  0.9328289
##                     cg14527649 cg23161429 cg20370184 cg18821122 cg07523188 cg12534577
## 200223270003_R02C01  0.2678912  0.8956965 0.37710950  0.9291309  0.7509183  0.8585231
## 200223270003_R03C01  0.7954683  0.9099619 0.05737964  0.5901603  0.1524386  0.8493466
## 200223270003_R06C01  0.8350610  0.8833895 0.04740505  0.5779620  0.7127592  0.8395241
## 200223270003_R07C01  0.8428684  0.9134709 0.83572095  0.9251431  0.8464983  0.8511384
## 200223270006_R01C01  0.8231348  0.8738558 0.04056608  0.9217018  0.7847738  0.8804655
## 200223270006_R04C01  0.8022444  0.9104210 0.04038589  0.5412250  0.8231277  0.3029013
##                     cg02356645 cg03982462 cg04248279 cg13080267 cg27639199 cg08198851
## 200223270003_R02C01  0.5105903  0.8562777  0.8534976 0.78936656 0.67515415  0.6578905
## 200223270003_R03C01  0.5833923  0.6023731  0.8458854 0.78371483 0.67552763  0.6578186
## 200223270003_R06C01  0.5701428  0.8778458  0.8332786 0.09436069 0.06233093  0.1272153
## 200223270003_R07C01  0.5683381  0.8860227  0.3303204 0.09351259 0.05701332  0.8351465
## 200223270006_R01C01  0.5233692  0.8703107  0.5966878 0.45173796 0.05037694  0.8791156
## 200223270006_R04C01  0.9188670  0.8792860  0.8939599 0.49866715 0.08144161  0.1423737
##                     cg11331837 cg24873924 cg20507276 cg25561557 cg22274273 cg12682323
## 200223270003_R02C01 0.03692842  0.3060635 0.12238910 0.76736369  0.4209386  0.9397956
## 200223270003_R03C01 0.57150125  0.8640985 0.38721972 0.03851635  0.4246379  0.9003940
## 200223270003_R06C01 0.03182862  0.8259149 0.47978438 0.47259480  0.4196796  0.9157877
## 200223270003_R07C01 0.03832164  0.8333940 0.02261996 0.43364249  0.4164100  0.9048877
## 200223270006_R01C01 0.93008298  0.8761177 0.37465798 0.46211439  0.7951105  0.1065347
## 200223270006_R04C01 0.54004452  0.8585363 0.03570795 0.44651530  0.0229810  0.8836232
##                     cg17738613 cg21209485  cg03088219 cg03660162 cg10750306 cg27272246
## 200223270003_R02C01  0.6879612  0.8865053 0.844002862  0.8691767 0.04919915  0.8615873
## 200223270003_R03C01  0.6582258  0.8714878 0.007435243  0.5160770 0.55160081  0.8705287
## 200223270003_R06C01  0.1022257  0.2292550 0.120155222  0.9026304 0.54694332  0.8103777
## 200223270003_R07C01  0.8960156  0.2351526 0.826554308  0.5305691 0.59824543  0.0310881
## 200223270006_R01C01  0.8850702  0.8882046 0.066294915  0.9257451 0.53158639  0.7686536
## 200223270006_R04C01  0.8481916  0.2292483 0.574738383  0.8935772 0.05646838  0.4403542
##                     cg11438323 cg12738248 cg21854924 cg20139683 cg16178271 cg07028768
## 200223270003_R02C01  0.4863471 0.85430866  0.8729132  0.8717075  0.6445416  0.4496851
## 200223270003_R03C01  0.8984559 0.88010292  0.7162342  0.9059433  0.6178075  0.8536078
## 200223270003_R06C01  0.8722772 0.51121855  0.7520990  0.8962554  0.6641952  0.8356936
## 200223270003_R07C01  0.5026756 0.09131476  0.8641284  0.9218012  0.7148058  0.4245893
## 200223270006_R01C01  0.8809646 0.91529345  0.6498895  0.1708472  0.6138954  0.8835151
## 200223270006_R04C01  0.8717937 0.91911405  0.5943113  0.1067122  0.9414188  0.4514661
##                     cg26474732 cg00675157 cg23916408 cg05321907 cg17429539 cg06950937
## 200223270003_R02C01  0.7843252  0.9188438  0.1942275  0.2880477  0.7860900  0.8910968
## 200223270003_R03C01  0.8184088  0.9242325  0.9154993  0.1782629  0.7100923  0.2889345
## 200223270003_R06C01  0.7358417  0.9254708  0.8886255  0.8427929  0.7660838  0.9143801
## 200223270003_R07C01  0.7509296  0.5447244  0.8872447  0.8320504  0.6984969  0.8891079
## 200223270006_R01C01  0.8294938  0.5173554  0.2219945  0.2422218  0.6508597  0.8868617
## 200223270006_R04C01  0.8033167  0.9247232  0.1520624  0.2429551  0.2828452  0.9093273
##                     cg14240646 cg27086157 cg25758034 cg11247378 cg19377607 cg07480176
## 200223270003_R02C01  0.5391334  0.9224112  0.6114028  0.1591185 0.05377464  0.5171664
## 200223270003_R03C01  0.2538363  0.9219304  0.6649219  0.7874849 0.90570746  0.3760452
## 200223270003_R06C01  0.1864902  0.3224986  0.2393844  0.4807942 0.06636174  0.6998389
## 200223270003_R07C01  0.6402007  0.3455486  0.7071501  0.4537348 0.68788639  0.2189042
## 200223270006_R01C01  0.7696079  0.8988962  0.2301078  0.1537079 0.06338988  0.5570021
## 200223270006_R04C01  0.1490028  0.9159217  0.6891513  0.1686356 0.91551446  0.4501196
##                     cg27577781 cg11187460 cg03071582 cg12284872 cg02932958 cg12012426
## 200223270003_R02C01  0.8143535 0.03672179  0.9187811  0.8008333  0.7901008  0.9165048
## 200223270003_R03C01  0.8113185 0.92516409  0.5844421  0.7414569  0.4210489  0.9434768
## 200223270003_R06C01  0.8144274 0.03109553  0.6245558  0.7725267  0.3825995  0.9220044
## 200223270003_R07C01  0.7970617 0.53283119  0.9283683  0.7573369  0.7617081  0.9241284
## 200223270006_R01C01  0.8640044 0.54038146  0.5715416  0.7201607  0.8431126  0.9327143
## 200223270006_R04C01  0.8840237 0.91096169  0.6534650  0.8021446  0.7610084  0.9271167
##                     cg06118351 cg00696044 cg25436480 cg02320265 cg11227702 cg18819889
## 200223270003_R02C01 0.36339400 0.55608424 0.84251599  0.8853213 0.86486075  0.9156157
## 200223270003_R03C01 0.47148604 0.07552381 0.49940321  0.4686314 0.49184121  0.9004455
## 200223270003_R06C01 0.86559618 0.79270858 0.34943119  0.4838749 0.02543724  0.9054439
## 200223270003_R07C01 0.83494303 0.03548419 0.85244913  0.8986848 0.45150971  0.9089935
## 200223270006_R01C01 0.02632111 0.10714386 0.44545117  0.8987560 0.89086877  0.9065397
## 200223270006_R04C01 0.83329300 0.18420803 0.02575036  0.4768520 0.87675947  0.9242767
##                     cg06112204 cg19512141 cg24506579 cg00272795 cg21697769 cg12776173
## 200223270003_R02C01  0.5251592  0.8209161  0.5244337 0.46365138  0.8946108 0.10388038
## 200223270003_R03C01  0.8773488  0.7903543  0.5794845 0.82839260  0.2822953 0.87306345
## 200223270003_R06C01  0.8867975  0.8404684  0.9427785 0.07231279  0.8698740 0.70094907
## 200223270003_R07C01  0.5613799  0.2202759  0.9323844 0.78303831  0.9134887 0.11367159
## 200223270006_R01C01  0.9184122  0.8059589  0.9185355 0.78219952  0.2683820 0.09458405
## 200223270006_R04C01  0.9152514  0.7020247  0.4332642 0.44408249  0.2765740 0.86532175
##                     cg07138269 cg17906851 cg08779649 cg10985055 cg08584917 cg04664583
## 200223270003_R02C01  0.5002290  0.9488392 0.44449401  0.8518169  0.5663205  0.5572814
## 200223270003_R03C01  0.9426707  0.9529718 0.45076825  0.8631895  0.9019732  0.5881190
## 200223270003_R06C01  0.5057781  0.6462151 0.04810217  0.5456633  0.9187789  0.9352717
## 200223270003_R07C01  0.9400527  0.9553497 0.42715969  0.8825100  0.6007449  0.9350230
## 200223270006_R01C01  0.9321602  0.6222117 0.89313476  0.8841690  0.9069098  0.9424588
## 200223270006_R04C01  0.9333501  0.6441202 0.59523771  0.8407797  0.9263584  0.9379537
##                     cg01933473 cg00689685 cg14307563 cg12784167 cg24851651 cg15633912
## 200223270003_R02C01  0.2589014  0.7019389  0.1855966 0.81503498 0.03674702  0.1605530
## 200223270003_R03C01  0.6726133  0.8634268  0.8916957 0.02811410 0.05358297  0.9333421
## 200223270003_R06C01  0.2642560  0.6378795  0.8750052 0.03073269 0.05968923  0.8737362
## 200223270003_R07C01  0.1978068  0.8624541  0.8975663 0.84775699 0.60864179  0.9137334
## 200223270006_R01C01  0.7599441  0.6361891  0.8762842 0.83825789 0.08825834  0.9169706
## 200223270006_R04C01  0.7405661  0.6356260  0.9168614 0.45475291 0.91932068  0.8890004
##                     cg12466610 cg16788319 cg20678988 cg01413796 cg01549082
## 200223270003_R02C01 0.05767659  0.9379870  0.8438718  0.1345128  0.2924138
## 200223270003_R03C01 0.59131778  0.8913429  0.8548886  0.2830672  0.7065693
## 200223270003_R06C01 0.06939623  0.8680680  0.7786685  0.8194681  0.2895440
## 200223270003_R07C01 0.04527733  0.8811444  0.8260541  0.9007710  0.6422955
## 200223270006_R01C01 0.05212904  0.3123481  0.3295384  0.2603027  0.8471236
## 200223270006_R04C01 0.05104033  0.2995627  0.8541667  0.9207672  0.6949888
dim(df_selected_Mean)
## [1] 648 156
print(selected_impAvg_ordered_NAME)
##   [1] "PC1"        "cg00962106" "PC2"        "cg05096415" "cg08857872" "cg23432430" "cg16652920"
##   [8] "cg06864789" "age.now"    "cg01921484" "cg26948066" "cg17186592" "cg09584650" "cg12279734"
##  [15] "cg02981548" "cg14710850" "PC3"        "cg07152869" "cg08861434" "cg15501526" "cg25259265"
##  [22] "cg02225060" "cg24859648" "cg11133939" "cg25879395" "cg02621446" "cg00247094" "cg02494911"
##  [29] "cg16771215" "cg24861747" "cg01153376" "cg04412904" "cg20913114" "cg01128042" "cg10240127"
##  [36] "cg14564293" "cg16749614" "cg01013522" "cg16579946" "cg03129555" "cg02372404" "cg05234269"
##  [43] "cg12146221" "cg12228670" "cg14924512" "cg27452255" "cg16715186" "cg00616572" "cg05570109"
##  [50] "cg00154902" "cg14293999" "cg17421046" "cg15775217" "cg09854620" "cg19503462" "cg26757229"
##  [57] "cg06378561" "cg01680303" "cg06715136" "cg15535896" "cg00322003" "cg27341708" "cg03084184"
##  [64] "cg26219488" "cg18339359" "cg06697310" "cg10369879" "cg10738648" "cg06536614" "cg26069044"
##  [71] "cg20685672" "cg03327352" "cg00999469" "cg23658987" "cg05841700" "cg01667144" "cg15865722"
##  [78] "cg13885788" "cg14527649" "cg23161429" "cg20370184" "cg18821122" "cg07523188" "cg12534577"
##  [85] "cg02356645" "cg03982462" "cg04248279" "cg13080267" "cg27639199" "cg08198851" "cg11331837"
##  [92] "cg24873924" "cg20507276" "cg25561557" "cg22274273" "cg12682323" "cg17738613" "cg21209485"
##  [99] "cg03088219" "cg03660162" "cg10750306" "cg27272246" "cg11438323" "cg12738248" "cg21854924"
## [106] "cg20139683" "cg16178271" "cg07028768" "cg26474732" "cg00675157" "cg23916408" "cg05321907"
## [113] "cg17429539" "cg06950937" "cg14240646" "cg27086157" "cg25758034" "cg11247378" "cg19377607"
## [120] "cg07480176" "cg27577781" "cg11187460" "cg03071582" "cg12284872" "cg02932958" "cg12012426"
## [127] "cg06118351" "cg00696044" "cg25436480" "cg02320265" "cg11227702" "cg18819889" "cg06112204"
## [134] "cg19512141" "cg24506579" "cg00272795" "cg21697769" "cg12776173" "cg07138269" "cg17906851"
## [141] "cg08779649" "cg10985055" "cg08584917" "cg04664583" "cg01933473" "cg00689685" "cg14307563"
## [148] "cg12784167" "cg24851651" "cg15633912" "cg12466610" "cg16788319" "cg20678988" "cg01413796"
## [155] "cg01549082"
output_mean_process<-processed_data[,c("DX",selected_impAvg_ordered_NAME)]
print(head(output_mean_process))
## # A tibble: 6 × 156
##   DX            PC1 cg00962106        PC2 cg05096415 cg08857872 cg23432430 cg16652920 cg06864789
##   <fct>       <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
## 1 MCI      -0.214        0.912  0.0147         0.918      0.340      0.948      0.944     0.0537
## 2 CN       -0.173        0.538  0.0575         0.518      0.818      0.946      0.943     0.461 
## 3 CN       -0.00367      0.504  0.0837         0.629      0.297      0.942      0.946     0.875 
## 4 Dementia -0.187        0.904 -0.0112         0.606      0.295      0.943      0.942     0.490 
## 5 MCI       0.0268       0.896  0.0000165      0.560      0.894      0.946      0.953     0.479 
## 6 CN       -0.0379       0.886  0.0157         0.544      0.890      0.951      0.949     0.0542
## # ℹ 147 more variables: age.now <dbl>, cg01921484 <dbl>, cg26948066 <dbl>, cg17186592 <dbl>,
## #   cg09584650 <dbl>, cg12279734 <dbl>, cg02981548 <dbl>, cg14710850 <dbl>, PC3 <dbl>,
## #   cg07152869 <dbl>, cg08861434 <dbl>, cg15501526 <dbl>, cg25259265 <dbl>, cg02225060 <dbl>,
## #   cg24859648 <dbl>, cg11133939 <dbl>, cg25879395 <dbl>, cg02621446 <dbl>, cg00247094 <dbl>,
## #   cg02494911 <dbl>, cg16771215 <dbl>, cg24861747 <dbl>, cg01153376 <dbl>, cg04412904 <dbl>,
## #   cg20913114 <dbl>, cg01128042 <dbl>, cg10240127 <dbl>, cg14564293 <dbl>, cg16749614 <dbl>,
## #   cg01013522 <dbl>, cg16579946 <dbl>, cg03129555 <dbl>, cg02372404 <dbl>, cg05234269 <dbl>, …
dim(output_mean_process)
## [1] 648 156

Based on Median

Selected_median_imp <- head(combined_importance_quantiles,n = Number_fea_input)
print(head(Selected_median_imp))
##        Feature        0%       25%       50%       75%      100%
## 153        PC1 0.1240874 0.1866056 0.6666667 1.0000000 1.0000000
## 10  cg00962106 0.2500000 0.4661381 0.5279227 0.6281880 0.7275966
## 102 cg16652920 0.3449586 0.4472525 0.5000000 0.5184487 0.5211088
## 155        PC3 0.1650015 0.2153460 0.5000000 0.5045821 0.6786379
## 150 cg27452255 0.0000000 0.1809858 0.4871655 0.4910563 0.5000000
## 61  cg08861434 0.1426766 0.2358769 0.4830292 0.4931322 0.6666667
Selected_median_imp_Name<-Selected_median_imp$Feature
print(head(Selected_median_imp_Name))
## [1] "PC1"        "cg00962106" "cg16652920" "PC3"        "cg27452255" "cg08861434"
df_selected_Median <- processed_dataFrame[,c("DX",Selected_median_imp_Name)]
output_median_feature<-processed_data[,c("DX",Selected_median_imp_Name)]
  
print(head(df_selected_Median))
##                           DX          PC1 cg00962106 cg16652920          PC3 cg27452255
## 200223270003_R02C01      MCI -0.214185447  0.9124898  0.9436000 -0.014043316  0.9001010
## 200223270003_R03C01       CN -0.172761185  0.5375751  0.9431222  0.005055871  0.6593379
## 200223270003_R06C01       CN -0.003667305  0.5040948  0.9457161  0.029143653  0.9012217
## 200223270003_R07C01 Dementia -0.186779607  0.9039029  0.9419785 -0.032302430  0.8898635
## 200223270006_R01C01      MCI  0.026814649  0.8961556  0.9529417  0.052947950  0.5779792
## 200223270006_R04C01       CN -0.037862929  0.8857597  0.9492648 -0.008685676  0.8809143
##                     cg08861434 cg06864789 cg08857872 cg07152869 cg09584650 cg16749614  age.now
## 200223270003_R02C01  0.8768306 0.05369415  0.3395280  0.8284151 0.08230254  0.8678741 82.40000
## 200223270003_R03C01  0.4352647 0.46053125  0.8181845  0.5050630 0.09661586  0.8539348 78.60000
## 200223270003_R06C01  0.8698813 0.87513655  0.2970779  0.8352490 0.52399749  0.5874127 80.40000
## 200223270003_R07C01  0.4709249 0.49020327  0.2954090  0.5194300 0.11587211  0.5555391 78.16441
## 200223270006_R01C01  0.8618532 0.47852685  0.8935876  0.5025709 0.42115185  0.8026346 62.90000
## 200223270006_R04C01  0.9058965 0.05423587  0.8901338  0.8080916 0.56043178  0.7903978 80.67796
##                     cg05096415 cg23432430 cg01921484 cg02225060 cg02981548 cg14710850
## 200223270003_R02C01  0.9182527  0.9482702 0.90985496  0.6828159  0.1342571  0.8048592
## 200223270003_R03C01  0.5177819  0.9455418 0.90931369  0.8265195  0.5220037  0.8090950
## 200223270003_R06C01  0.6288426  0.9418716 0.92044873  0.5209552  0.5098965  0.8285902
## 200223270003_R07C01  0.6060271  0.9426559 0.91674311  0.8078889  0.5660985  0.8336457
## 200223270006_R01C01  0.5599588  0.9461736 0.02943747  0.6084903  0.5678714  0.8500725
## 200223270006_R04C01  0.5441200  0.9508404 0.89057041  0.7638781  0.5079859  0.8207247
##                     cg19503462           PC2 cg17186592 cg00247094 cg11133939 cg25259265
## 200223270003_R02C01  0.7951675  1.470293e-02  0.9230463  0.5399349  0.1282694  0.4356646
## 200223270003_R03C01  0.4537684  5.745834e-02  0.8593448  0.9315640  0.5920898  0.8893591
## 200223270003_R06C01  0.6997359  8.372861e-02  0.8467599  0.5177874  0.5127706  0.4201700
## 200223270003_R07C01  0.7189778 -1.117250e-02  0.4986373  0.5377765  0.8474176  0.4455517
## 200223270006_R01C01  0.7301755  1.650735e-05  0.8978999  0.9109309  0.8589133  0.8423337
## 200223270006_R04C01  0.4207207  1.571950e-02  0.9239750  0.5266535  0.5246557  0.8460736
##                     cg16715186 cg05570109 cg26948066 cg02494911 cg14293999 cg14924512
## 200223270003_R02C01  0.2742789  0.3466611  0.4685225  0.3049435  0.2836710  0.5303907
## 200223270003_R03C01  0.7946153  0.5866750  0.5026045  0.2416332  0.9172023  0.9160885
## 200223270003_R06C01  0.8124316  0.4046471  0.9101976  0.2520909  0.9168166  0.9088414
## 200223270003_R07C01  0.7773263  0.6014355  0.9379543  0.2457032  0.9188336  0.9081681
## 200223270006_R01C01  0.8334531  0.5774881  0.9120181  0.8045030  0.1971116  0.9111789
## 200223270006_R04C01  0.8039945  0.8756826  0.8868608  0.7489283  0.9030919  0.5331753
##                     cg02621446 cg03129555 cg04412904 cg26219488 cg00154902 cg20913114
## 200223270003_R02C01  0.8731313  0.6079616 0.05088595  0.9336638  0.5137741 0.36510482
## 200223270003_R03C01  0.8095534  0.5785498 0.07717659  0.9134707  0.8540746 0.80382984
## 200223270003_R06C01  0.7511582  0.9137818 0.08253743  0.9261878  0.8188126 0.03158439
## 200223270003_R07C01  0.8773609  0.9043041 0.06217431  0.9217866  0.4625776 0.81256840
## 200223270006_R01C01  0.2046541  0.9286357 0.11888769  0.4929692  0.4690086 0.81502059
## 200223270006_R04C01  0.7963817  0.9088564 0.08885846  0.9431574  0.4547219 0.90468830
##                     cg03084184 cg12279734 cg01153376 cg16771215 cg04248279 cg06536614
## 200223270003_R02C01  0.8162981  0.6435368  0.4872148 0.88389723  0.8534976  0.5824474
## 200223270003_R03C01  0.7877128  0.1494651  0.9639670 0.07196933  0.8458854  0.5746694
## 200223270003_R06C01  0.4546397  0.8760759  0.2242410 0.09949974  0.8332786  0.5773468
## 200223270003_R07C01  0.7812413  0.8674214  0.5155654 0.64234023  0.3303204  0.5848917
## 200223270006_R01C01  0.7818230  0.6454450  0.9588916 0.62679274  0.5966878  0.5669919
## 200223270006_R04C01  0.7725853  0.8660058  0.9586876 0.06970175  0.8939599  0.5718514
##                     cg09854620 cg06378561 cg24859648 cg10240127 cg12228670 cg03327352
## 200223270003_R02C01  0.5220587  0.9389306 0.83777536  0.9250553  0.8632174  0.8851712
## 200223270003_R03C01  0.8739646  0.9377503 0.44392797  0.9403255  0.8496212  0.8786878
## 200223270003_R06C01  0.8973149  0.5154019 0.03341185  0.9056974  0.8738949  0.3042310
## 200223270003_R07C01  0.8958863  0.9403569 0.43582347  0.9396217  0.8362189  0.8273211
## 200223270006_R01C01  0.9075331  0.4956816 0.03087161  0.9262370  0.8079694  0.8774082
## 200223270006_R04C01  0.9318820  0.9268832 0.02588024  0.9240497  0.6966666  0.8829492
##                     cg12146221 cg03982462 cg05841700 cg15865722 cg07523188 cg11227702
## 200223270003_R02C01  0.2049284  0.8562777  0.2923544 0.89438595  0.7509183 0.86486075
## 200223270003_R03C01  0.1814927  0.6023731  0.9146488 0.90194372  0.1524386 0.49184121
## 200223270003_R06C01  0.8619250  0.8778458  0.3737990 0.92118977  0.7127592 0.02543724
## 200223270003_R07C01  0.1238469  0.8860227  0.5046468 0.09230759  0.8464983 0.45150971
## 200223270006_R01C01  0.2021598  0.8703107  0.8419031 0.93422668  0.7847738 0.89086877
## 200223270006_R04C01  0.1383786  0.8792860  0.9286652 0.92220002  0.8231277 0.87675947
##                     cg10369879 cg16579946 cg24861747 cg14564293 cg01128042 cg00616572
## 200223270003_R02C01  0.9218784  0.6306315  0.3540897 0.52089591  0.9113420  0.9335067
## 200223270003_R03C01  0.3149306  0.6648766  0.4309505 0.04000662  0.5328806  0.9214079
## 200223270003_R06C01  0.9141081  0.6455081  0.8071462 0.04959460  0.5222757  0.9113633
## 200223270003_R07C01  0.9054415  0.8979650  0.3347317 0.03114773  0.5141721  0.9160238
## 200223270006_R01C01  0.2917862  0.6886498  0.3544795 0.51703196  0.9321215  0.4861334
## 200223270006_R04C01  0.9200403  0.6766907  0.5997840 0.51535010  0.5050081  0.9067928
##                     cg08198851 cg17421046 cg15535896 cg18339359 cg00322003 cg02372404
## 200223270003_R02C01  0.6578905  0.9026993  0.3382952  0.8824858  0.1759911 0.03598249
## 200223270003_R03C01  0.6578186  0.9112100  0.9253926  0.9040272  0.5702070 0.02767285
## 200223270003_R06C01  0.1272153  0.8952031  0.3320191  0.8552121  0.3077122 0.03127855
## 200223270003_R07C01  0.8351465  0.9268852  0.9409104  0.3073106  0.6104341 0.55685785
## 200223270006_R01C01  0.8791156  0.1118337  0.9326027  0.8973742  0.6147419 0.02587736
## 200223270006_R04C01  0.1423737  0.4174370  0.9156401  0.2292800  0.2293759 0.02828648
##                     cg11331837 cg23658987 cg10738648 cg25561557 cg01667144 cg05234269
## 200223270003_R02C01 0.03692842 0.79757644 0.44931577 0.76736369  0.8971484 0.93848584
## 200223270003_R03C01 0.57150125 0.07511718 0.49894016 0.03851635  0.3175389 0.57461229
## 200223270003_R06C01 0.03182862 0.10177571 0.05552024 0.47259480  0.9238364 0.02467208
## 200223270003_R07C01 0.03832164 0.46747992 0.03730440 0.43364249  0.8739442 0.56516794
## 200223270006_R01C01 0.93008298 0.76831297 0.54952781 0.46211439  0.2931961 0.94829529
## 200223270006_R04C01 0.54004452 0.08988532 0.59358167 0.44651530  0.8616530 0.56298286
##                     cg12534577 cg06118351 cg13885788 cg10750306 cg15775217 cg01013522
## 200223270003_R02C01  0.8585231 0.36339400  0.9380618 0.04919915  0.5707441  0.6251168
## 200223270003_R03C01  0.8493466 0.47148604  0.9369476 0.55160081  0.9168327  0.8862821
## 200223270003_R06C01  0.8395241 0.86559618  0.5163017 0.54694332  0.6042521  0.5425308
## 200223270003_R07C01  0.8511384 0.83494303  0.9183376 0.59824543  0.9062231  0.8429862
## 200223270006_R01C01  0.8804655 0.02632111  0.5525542 0.53158639  0.9083515  0.0480531
## 200223270006_R04C01  0.3029013 0.83329300  0.9328289 0.05646838  0.6383270  0.8240222
##                     cg26474732 cg27086157  cg03088219 cg15501526 cg27577781 cg11438323
## 200223270003_R02C01  0.7843252  0.9224112 0.844002862  0.6362531  0.8143535  0.4863471
## 200223270003_R03C01  0.8184088  0.9219304 0.007435243  0.6319253  0.8113185  0.8984559
## 200223270003_R06C01  0.7358417  0.3224986 0.120155222  0.7435100  0.8144274  0.8722772
## 200223270003_R07C01  0.7509296  0.3455486 0.826554308  0.7756577  0.7970617  0.5026756
## 200223270006_R01C01  0.8294938  0.8988962 0.066294915  0.3230777  0.8640044  0.8809646
## 200223270006_R04C01  0.8033167  0.9159217 0.574738383  0.8342695  0.8840237  0.8717937
##                     cg06715136 cg17738613 cg01680303 cg06697310 cg22274273 cg12738248
## 200223270003_R02C01  0.3400192  0.6879612  0.5095174  0.8454609  0.4209386 0.85430866
## 200223270003_R03C01  0.9259109  0.6582258  0.1344941  0.8653044  0.4246379 0.88010292
## 200223270003_R06C01  0.9079807  0.1022257  0.7573869  0.2405168  0.4196796 0.51121855
## 200223270003_R07C01  0.6782105  0.8960156  0.4772204  0.8479193  0.4164100 0.09131476
## 200223270006_R01C01  0.8369052  0.8850702  0.1176263  0.8206613  0.7951105 0.91529345
## 200223270006_R04C01  0.8807568  0.8481916  0.5133033  0.7839595  0.0229810 0.91911405
##                     cg21854924 cg14240646 cg03071582 cg24873924 cg17429539 cg06950937
## 200223270003_R02C01  0.8729132  0.5391334  0.9187811  0.3060635  0.7860900  0.8910968
## 200223270003_R03C01  0.7162342  0.2538363  0.5844421  0.8640985  0.7100923  0.2889345
## 200223270003_R06C01  0.7520990  0.1864902  0.6245558  0.8259149  0.7660838  0.9143801
## 200223270003_R07C01  0.8641284  0.6402007  0.9283683  0.8333940  0.6984969  0.8891079
## 200223270006_R01C01  0.6498895  0.7696079  0.5715416  0.8761177  0.6508597  0.8868617
## 200223270006_R04C01  0.5943113  0.1490028  0.6534650  0.8585363  0.2828452  0.9093273
##                     cg13080267 cg27272246 cg27341708 cg18821122 cg12682323 cg12012426
## 200223270003_R02C01 0.78936656  0.8615873 0.48846610  0.9291309  0.9397956  0.9165048
## 200223270003_R03C01 0.78371483  0.8705287 0.02613847  0.5901603  0.9003940  0.9434768
## 200223270003_R06C01 0.09436069  0.8103777 0.86893582  0.5779620  0.9157877  0.9220044
## 200223270003_R07C01 0.09351259  0.0310881 0.02642300  0.9251431  0.9048877  0.9241284
## 200223270006_R01C01 0.45173796  0.7686536 0.47573455  0.9217018  0.1065347  0.9327143
## 200223270006_R04C01 0.49866715  0.4403542 0.89411974  0.5412250  0.8836232  0.9271167
##                     cg05321907 cg20139683 cg20685672 cg26757229 cg25436480 cg23916408
## 200223270003_R02C01  0.2880477  0.8717075 0.67121006  0.6723726 0.84251599  0.1942275
## 200223270003_R03C01  0.1782629  0.9059433 0.79320906  0.1422661 0.49940321  0.9154993
## 200223270003_R06C01  0.8427929  0.8962554 0.66136456  0.7933794 0.34943119  0.8886255
## 200223270003_R07C01  0.8320504  0.9218012 0.80838304  0.8074830 0.85244913  0.8872447
## 200223270006_R01C01  0.2422218  0.1708472 0.08291414  0.5265692 0.44545117  0.2219945
## 200223270006_R04C01  0.2429551  0.1067122 0.84460055  0.7341953 0.02575036  0.1520624
##                     cg20507276 cg02356645 cg07028768 cg00272795 cg25758034 cg16178271
## 200223270003_R02C01 0.12238910  0.5105903  0.4496851 0.46365138  0.6114028  0.6445416
## 200223270003_R03C01 0.38721972  0.5833923  0.8536078 0.82839260  0.6649219  0.6178075
## 200223270003_R06C01 0.47978438  0.5701428  0.8356936 0.07231279  0.2393844  0.6641952
## 200223270003_R07C01 0.02261996  0.5683381  0.4245893 0.78303831  0.7071501  0.7148058
## 200223270006_R01C01 0.37465798  0.5233692  0.8835151 0.78219952  0.2301078  0.6138954
## 200223270006_R04C01 0.03570795  0.9188670  0.4514661 0.44408249  0.6891513  0.9414188
##                     cg27639199 cg11187460 cg21209485 cg14527649 cg23161429 cg19512141
## 200223270003_R02C01 0.67515415 0.03672179  0.8865053  0.2678912  0.8956965  0.8209161
## 200223270003_R03C01 0.67552763 0.92516409  0.8714878  0.7954683  0.9099619  0.7903543
## 200223270003_R06C01 0.06233093 0.03109553  0.2292550  0.8350610  0.8833895  0.8404684
## 200223270003_R07C01 0.05701332 0.53283119  0.2351526  0.8428684  0.9134709  0.2202759
## 200223270006_R01C01 0.05037694 0.54038146  0.8882046  0.8231348  0.8738558  0.8059589
## 200223270006_R04C01 0.08144161 0.91096169  0.2292483  0.8022444  0.9104210  0.7020247
##                     cg02320265 cg20370184 cg12284872 cg04664583 cg11247378 cg26069044
## 200223270003_R02C01  0.8853213 0.37710950  0.8008333  0.5572814  0.1591185 0.92401867
## 200223270003_R03C01  0.4686314 0.05737964  0.7414569  0.5881190  0.7874849 0.94072227
## 200223270003_R06C01  0.4838749 0.04740505  0.7725267  0.9352717  0.4807942 0.93321315
## 200223270003_R07C01  0.8986848 0.83572095  0.7573369  0.9350230  0.4537348 0.56567694
## 200223270006_R01C01  0.8987560 0.04056608  0.7201607  0.9424588  0.1537079 0.94369927
## 200223270006_R04C01  0.4768520 0.04038589  0.8021446  0.9379537  0.1686356 0.02040391
##                     cg25879395 cg00999469 cg06112204 cg02932958 cg19377607 cg12784167
## 200223270003_R02C01 0.88130864  0.3274080  0.5251592  0.7901008 0.05377464 0.81503498
## 200223270003_R03C01 0.02603438  0.2857719  0.8773488  0.4210489 0.90570746 0.02811410
## 200223270003_R06C01 0.91060615  0.2499229  0.8867975  0.3825995 0.06636174 0.03073269
## 200223270003_R07C01 0.89205942  0.2819622  0.5613799  0.7617081 0.68788639 0.84775699
## 200223270006_R01C01 0.47886249  0.2933539  0.9184122  0.8431126 0.06338988 0.83825789
## 200223270006_R04C01 0.02145248  0.2966623  0.9152514  0.7610084 0.91551446 0.45475291
##                     cg07480176 cg00696044 cg18819889 cg00689685 cg00675157 cg03660162
## 200223270003_R02C01  0.5171664 0.55608424  0.9156157  0.7019389  0.9188438  0.8691767
## 200223270003_R03C01  0.3760452 0.07552381  0.9004455  0.8634268  0.9242325  0.5160770
## 200223270003_R06C01  0.6998389 0.79270858  0.9054439  0.6378795  0.9254708  0.9026304
## 200223270003_R07C01  0.2189042 0.03548419  0.9089935  0.8624541  0.5447244  0.5305691
## 200223270006_R01C01  0.5570021 0.10714386  0.9065397  0.6361891  0.5173554  0.9257451
## 200223270006_R04C01  0.4501196 0.18420803  0.9242767  0.6356260  0.9247232  0.8935772
##                     cg10985055 cg07138269 cg21697769 cg08779649 cg01933473 cg17906851
## 200223270003_R02C01  0.8518169  0.5002290  0.8946108 0.44449401  0.2589014  0.9488392
## 200223270003_R03C01  0.8631895  0.9426707  0.2822953 0.45076825  0.6726133  0.9529718
## 200223270003_R06C01  0.5456633  0.5057781  0.8698740 0.04810217  0.2642560  0.6462151
## 200223270003_R07C01  0.8825100  0.9400527  0.9134887 0.42715969  0.1978068  0.9553497
## 200223270006_R01C01  0.8841690  0.9321602  0.2683820 0.89313476  0.7599441  0.6222117
## 200223270006_R04C01  0.8407797  0.9333501  0.2765740 0.59523771  0.7405661  0.6441202
##                     cg14307563 cg12776173 cg24851651 cg08584917 cg16788319 cg24506579
## 200223270003_R02C01  0.1855966 0.10388038 0.03674702  0.5663205  0.9379870  0.5244337
## 200223270003_R03C01  0.8916957 0.87306345 0.05358297  0.9019732  0.8913429  0.5794845
## 200223270003_R06C01  0.8750052 0.70094907 0.05968923  0.9187789  0.8680680  0.9427785
## 200223270003_R07C01  0.8975663 0.11367159 0.60864179  0.6007449  0.8811444  0.9323844
## 200223270006_R01C01  0.8762842 0.09458405 0.08825834  0.9069098  0.3123481  0.9185355
## 200223270006_R04C01  0.9168614 0.86532175 0.91932068  0.9263584  0.2995627  0.4332642
##                     cg01549082 cg12466610 cg15633912 cg01413796 cg20678988
## 200223270003_R02C01  0.2924138 0.05767659  0.1605530  0.1345128  0.8438718
## 200223270003_R03C01  0.7065693 0.59131778  0.9333421  0.2830672  0.8548886
## 200223270003_R06C01  0.2895440 0.06939623  0.8737362  0.8194681  0.7786685
## 200223270003_R07C01  0.6422955 0.04527733  0.9137334  0.9007710  0.8260541
## 200223270006_R01C01  0.8471236 0.05212904  0.9169706  0.2603027  0.3295384
## 200223270006_R04C01  0.6949888 0.05104033  0.8890004  0.9207672  0.8541667
dim(df_selected_Median)
## [1] 648 156
print(Selected_median_imp_Name)
##   [1] "PC1"        "cg00962106" "cg16652920" "PC3"        "cg27452255" "cg08861434" "cg06864789"
##   [8] "cg08857872" "cg07152869" "cg09584650" "cg16749614" "age.now"    "cg05096415" "cg23432430"
##  [15] "cg01921484" "cg02225060" "cg02981548" "cg14710850" "cg19503462" "PC2"        "cg17186592"
##  [22] "cg00247094" "cg11133939" "cg25259265" "cg16715186" "cg05570109" "cg26948066" "cg02494911"
##  [29] "cg14293999" "cg14924512" "cg02621446" "cg03129555" "cg04412904" "cg26219488" "cg00154902"
##  [36] "cg20913114" "cg03084184" "cg12279734" "cg01153376" "cg16771215" "cg04248279" "cg06536614"
##  [43] "cg09854620" "cg06378561" "cg24859648" "cg10240127" "cg12228670" "cg03327352" "cg12146221"
##  [50] "cg03982462" "cg05841700" "cg15865722" "cg07523188" "cg11227702" "cg10369879" "cg16579946"
##  [57] "cg24861747" "cg14564293" "cg01128042" "cg00616572" "cg08198851" "cg17421046" "cg15535896"
##  [64] "cg18339359" "cg00322003" "cg02372404" "cg11331837" "cg23658987" "cg10738648" "cg25561557"
##  [71] "cg01667144" "cg05234269" "cg12534577" "cg06118351" "cg13885788" "cg10750306" "cg15775217"
##  [78] "cg01013522" "cg26474732" "cg27086157" "cg03088219" "cg15501526" "cg27577781" "cg11438323"
##  [85] "cg06715136" "cg17738613" "cg01680303" "cg06697310" "cg22274273" "cg12738248" "cg21854924"
##  [92] "cg14240646" "cg03071582" "cg24873924" "cg17429539" "cg06950937" "cg13080267" "cg27272246"
##  [99] "cg27341708" "cg18821122" "cg12682323" "cg12012426" "cg05321907" "cg20139683" "cg20685672"
## [106] "cg26757229" "cg25436480" "cg23916408" "cg20507276" "cg02356645" "cg07028768" "cg00272795"
## [113] "cg25758034" "cg16178271" "cg27639199" "cg11187460" "cg21209485" "cg14527649" "cg23161429"
## [120] "cg19512141" "cg02320265" "cg20370184" "cg12284872" "cg04664583" "cg11247378" "cg26069044"
## [127] "cg25879395" "cg00999469" "cg06112204" "cg02932958" "cg19377607" "cg12784167" "cg07480176"
## [134] "cg00696044" "cg18819889" "cg00689685" "cg00675157" "cg03660162" "cg10985055" "cg07138269"
## [141] "cg21697769" "cg08779649" "cg01933473" "cg17906851" "cg14307563" "cg12776173" "cg24851651"
## [148] "cg08584917" "cg16788319" "cg24506579" "cg01549082" "cg12466610" "cg15633912" "cg01413796"
## [155] "cg20678988"
print(head(output_median_feature))
## # A tibble: 6 × 156
##   DX            PC1 cg00962106 cg16652920      PC3 cg27452255 cg08861434 cg06864789 cg08857872
##   <fct>       <dbl>      <dbl>      <dbl>    <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
## 1 MCI      -0.214        0.912      0.944 -0.0140       0.900      0.877     0.0537      0.340
## 2 CN       -0.173        0.538      0.943  0.00506      0.659      0.435     0.461       0.818
## 3 CN       -0.00367      0.504      0.946  0.0291       0.901      0.870     0.875       0.297
## 4 Dementia -0.187        0.904      0.942 -0.0323       0.890      0.471     0.490       0.295
## 5 MCI       0.0268       0.896      0.953  0.0529       0.578      0.862     0.479       0.894
## 6 CN       -0.0379       0.886      0.949 -0.00869      0.881      0.906     0.0542      0.890
## # ℹ 147 more variables: cg07152869 <dbl>, cg09584650 <dbl>, cg16749614 <dbl>, age.now <dbl>,
## #   cg05096415 <dbl>, cg23432430 <dbl>, cg01921484 <dbl>, cg02225060 <dbl>, cg02981548 <dbl>,
## #   cg14710850 <dbl>, cg19503462 <dbl>, PC2 <dbl>, cg17186592 <dbl>, cg00247094 <dbl>,
## #   cg11133939 <dbl>, cg25259265 <dbl>, cg16715186 <dbl>, cg05570109 <dbl>, cg26948066 <dbl>,
## #   cg02494911 <dbl>, cg14293999 <dbl>, cg14924512 <dbl>, cg02621446 <dbl>, cg03129555 <dbl>,
## #   cg04412904 <dbl>, cg26219488 <dbl>, cg00154902 <dbl>, cg20913114 <dbl>, cg03084184 <dbl>,
## #   cg12279734 <dbl>, cg01153376 <dbl>, cg16771215 <dbl>, cg04248279 <dbl>, cg06536614 <dbl>, …

Based on Frequency

Function for Frequency Selection

choose the mutual importance feature when it exist at least half number of model’s (i.e 3 in our case)

The frequency / common feature importance is processed in the following:

  1. Select the TOP Number of features for each model (This number is set to “Number_fea_input” this session, Number_fea_input <- INPUT_NUMBER_FEATURES , and “INPUT_NUMBER_FEATURES” in the INPUT session )
  2. Calculated the frequency of the appearance of each features based on the Top Number of features selected from step1.
  3. For each features that appear greater or equal than half time, we consider it’s important and collect these important features as common features.
n_select_frequencyWay <- Number_fea_input
combined_importance_freq_ordered_df <- combined_importance_Avg_ordered
df_Selected_Frequency_Imp <- function(n_select_frequencyWay,FeatureImportanceTable){
# In this function, we Input the feature importance data frame, 
# And process with the steps we discussed before.
# The output will be the feature frequency Table. 
#  (i.e. frequency of the appearance of each features based on the Top Number of features selected)
  
  
# LRM
## All_impAvg_orderby_LRM
All_impAvg_orderby_LRM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_LRM1),]
## top_impAvg_orderby_LRM
top_impAvg_orderby_LRM <- head(All_impAvg_orderby_LRM,n = n_select_frequencyWay)
top_impAvg_orderby_LRM_NAME <- top_impAvg_orderby_LRM$Feature

# XGB
## All_impAvg_orderby_XGB
All_impAvg_orderby_XGB <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_XGB),]
## top_impAvg_orderby_XGB
top_impAvg_orderby_XGB <- head(All_impAvg_orderby_XGB,n = n_select_frequencyWay)
top_impAvg_orderby_XGB_NAME <- top_impAvg_orderby_XGB$Feature


# ENM
## all_impAvg_orderby_ENM
All_impAvg_orderby_ENM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_ENM1),]
## top_impAvg_orderby_ENM
top_impAvg_orderby_ENM <- head(All_impAvg_orderby_ENM,n = n_select_frequencyWay)
top_impAvg_orderby_ENM_NAME <- top_impAvg_orderby_ENM$Feature


# RF
## all_impAvg_orderby_RF
All_impAvg_orderby_RF <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_RF),]
## top_impAvg_orderby_RF
top_impAvg_orderby_RF <- head(All_impAvg_orderby_RF,n = n_select_frequencyWay)
top_impAvg_orderby_RF_NAME <- top_impAvg_orderby_RF$Feature


# SVM
## all_impAvg_orderby_SVM
All_impAvg_orderby_SVM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_SVM),]
## top_impAvg_orderby_SVM
top_impAvg_orderby_SVM <- head(All_impAvg_orderby_SVM,n = n_select_frequencyWay)
top_impAvg_orderby_SVM_NAME <- top_impAvg_orderby_SVM$Feature


# Combine all features into a unique collection
all_features <- unique(c(top_impAvg_orderby_LRM_NAME, top_impAvg_orderby_XGB_NAME, top_impAvg_orderby_ENM_NAME,top_impAvg_orderby_RF_NAME,top_impAvg_orderby_SVM_NAME))

models<-c("LRM","XGB","ENM","RF","SVM")
feature_matrix <- matrix(0, nrow = length(all_features), ncol = length(models), 
                         dimnames = list(all_features, models))

# Fill the dataframe indicating presence (1) or absence (0) of each feature in each model
for (feature in all_features) {
  feature_matrix[feature, "LRM"] <- 
    as.integer(feature %in% top_impAvg_orderby_LRM_NAME)
  feature_matrix[feature, "XGB"] <- 
    as.integer(feature %in% top_impAvg_orderby_XGB_NAME)
  feature_matrix[feature, "ENM"] <- 
    as.integer(feature %in% top_impAvg_orderby_ENM_NAME)
  feature_matrix[feature, "RF"] <- 
    as.integer(feature %in% top_impAvg_orderby_RF_NAME)
  feature_matrix[feature, "SVM"] <- 
    as.integer(feature %in% top_impAvg_orderby_SVM_NAME)
}

# Convert the matrix to a data frame
feature_df <- as.data.frame(feature_matrix)

feature_df$Total_Count <- rowSums(feature_df[,1:5])
# Sort the dataframe by the Total_Count in descending order
feature_df <- feature_df[order(-feature_df$Total_Count), ]
print(feature_df)
return(feature_df)
}

Now, the function will be tested below:

df_Func_test<-df_Selected_Frequency_Imp(NUM_COMMON_FEATURES_SET_Frequency,combined_importance_freq_ordered_df)
##            LRM XGB ENM RF SVM Total_Count
## cg00962106   1   1   1  1   0           4
## PC1          1   0   1  0   1           3
## cg14710850   1   0   1  1   0           3
## cg02981548   1   1   1  0   0           3
## cg08861434   1   0   1  0   1           3
## cg07152869   1   0   1  0   1           3
## cg05096415   1   1   0  0   1           3
## cg23432430   1   0   1  0   1           3
## cg17186592   1   0   0  1   1           3
## cg09584650   1   1   1  0   0           3
## age.now      0   1   0  1   1           3
## cg16652920   0   1   1  1   0           3
## cg06864789   0   1   1  1   0           3
## cg08857872   0   1   1  1   0           3
## cg01921484   0   1   0  1   1           3
## cg26948066   0   1   1  0   1           3
## PC2          1   0   1  0   0           2
## PC3          1   0   1  0   0           2
## cg02225060   1   0   1  0   0           2
## cg27452255   1   0   1  0   0           2
## cg19503462   1   0   1  0   0           2
## cg16749614   1   0   1  0   0           2
## cg11133939   1   0   1  0   0           2
## cg15501526   0   1   0  1   0           2
## cg25259265   0   1   0  1   0           2
## cg01128042   0   1   0  0   1           2
## cg02494911   0   1   0  1   0           2
## cg12279734   0   0   0  1   1           2
## cg00247094   1   0   0  0   0           1
## cg16715186   1   0   0  0   0           1
## cg03129555   1   0   0  0   0           1
## cg14564293   0   1   0  0   0           1
## cg04412904   0   1   0  0   0           1
## cg16771215   0   1   0  0   0           1
## cg02621446   0   1   0  0   0           1
## cg15865722   0   1   0  0   0           1
## cg03327352   0   1   0  0   0           1
## cg02372404   0   0   1  0   0           1
## cg01153376   0   0   0  1   0           1
## cg23658987   0   0   0  1   0           1
## cg14293999   0   0   0  1   0           1
## cg05570109   0   0   0  1   0           1
## cg21209485   0   0   0  1   0           1
## cg16579946   0   0   0  1   0           1
## cg14924512   0   0   0  1   0           1
## cg07523188   0   0   0  1   0           1
## cg25879395   0   0   0  0   1           1
## cg26757229   0   0   0  0   1           1
## cg26069044   0   0   0  0   1           1
## cg00999469   0   0   0  0   1           1
## cg24861747   0   0   0  0   1           1
## cg01013522   0   0   0  0   1           1
## cg05234269   0   0   0  0   1           1
## cg00616572   0   0   0  0   1           1
## cg01680303   0   0   0  0   1           1
# The expected output should be zero.
sum(df_Func_test!=frequency_feature_df_RAW_ordered)
## [1] 0

Selected data frame based on Frequency for Output

choose the mutual importance feature when it exist at least half number of model’s (i.e 3 in our case)

The frequency / common feature importance is processed in the following:

  1. Select the TOP Number of features for each model (This number is set to “Number_fea_input” this session, Number_fea_input <- INPUT_NUMBER_FEATURES , and “INPUT_NUMBER_FEATURES” in the INPUT session )
  2. Calculated the frequency of the appearance of each features based on the Top Number of features selected from step1.
  3. For each features that appear greater or equal than half time, we consider it’s important and collect these important features as common features.
n_select_frequencyWay <- Number_fea_input
df_feature_Output_frequency <- df_Selected_Frequency_Imp(Number_fea_input,
                                                         combined_importance_freq_ordered_df)
##            LRM XGB ENM RF SVM Total_Count
## PC1          1   1   1  1   1           5
## PC2          1   1   1  1   1           5
## PC3          1   1   1  1   1           5
## cg00962106   1   1   1  1   1           5
## cg02225060   1   1   1  1   1           5
## cg14710850   1   1   1  1   1           5
## cg27452255   1   1   1  1   1           5
## cg02981548   1   1   1  1   1           5
## cg08861434   1   1   1  1   1           5
## cg19503462   1   1   1  1   1           5
## cg07152869   1   1   1  1   1           5
## cg16749614   1   1   1  1   1           5
## cg05096415   1   1   1  1   1           5
## cg23432430   1   1   1  1   1           5
## cg17186592   1   1   1  1   1           5
## cg00247094   1   1   1  1   1           5
## cg09584650   1   1   1  1   1           5
## cg11133939   1   1   1  1   1           5
## cg16715186   1   1   1  1   1           5
## cg03129555   1   1   1  1   1           5
## cg08857872   1   1   1  1   1           5
## cg06864789   1   1   1  1   1           5
## cg14924512   1   1   1  1   1           5
## cg16652920   1   1   1  1   1           5
## cg03084184   1   1   1  1   1           5
## cg26219488   1   1   1  1   1           5
## cg20913114   1   1   1  1   1           5
## cg06378561   1   1   1  1   1           5
## cg26948066   1   1   1  1   1           5
## cg25259265   1   1   1  1   1           5
## cg06536614   1   1   1  1   1           5
## cg24859648   1   1   1  1   1           5
## cg12279734   1   1   1  1   1           5
## cg03982462   1   1   1  1   1           5
## cg05841700   1   1   1  1   1           5
## cg11227702   1   1   1  1   1           5
## cg12146221   1   1   1  1   1           5
## cg02621446   1   1   1  1   1           5
## cg00616572   1   1   1  1   1           5
## cg15535896   1   1   1  1   1           5
## cg02372404   1   1   1  1   1           5
## cg09854620   1   1   1  1   1           5
## cg04248279   1   1   1  1   1           5
## cg20678988   1   1   1  1   1           5
## cg24861747   1   1   1  1   1           5
## cg10240127   1   1   1  1   1           5
## cg16771215   1   1   1  1   1           5
## cg01667144   1   1   1  1   1           5
## cg13080267   1   1   1  1   1           5
## cg02494911   1   1   1  1   1           5
## cg10750306   1   1   1  1   1           5
## cg11438323   1   1   1  1   1           5
## cg06715136   1   1   1  1   1           5
## cg04412904   1   1   1  1   1           5
## cg12738248   1   1   1  1   1           5
## cg03071582   1   1   1  1   1           5
## cg05570109   1   1   1  1   1           5
## cg15775217   1   1   1  1   1           5
## cg24873924   1   1   1  1   1           5
## cg17738613   1   1   1  1   1           5
## cg01921484   1   1   1  1   1           5
## cg10369879   1   1   1  1   1           5
## cg27341708   1   1   1  1   1           5
## cg12534577   1   1   1  1   1           5
## cg18821122   1   1   1  1   1           5
## cg12682323   1   1   1  1   1           5
## cg05234269   1   1   1  1   1           5
## cg20685672   1   1   1  1   1           5
## cg12228670   1   1   1  1   1           5
## cg11331837   1   1   1  1   1           5
## cg01680303   1   1   1  1   1           5
## cg17421046   1   1   1  1   1           5
## cg03088219   1   1   1  1   1           5
## cg02356645   1   1   1  1   1           5
## cg00322003   1   1   1  1   1           5
## cg01013522   1   1   1  1   1           5
## cg00272795   1   1   1  1   1           5
## cg25758034   1   1   1  1   1           5
## cg26474732   1   1   1  1   1           5
## cg16579946   1   1   1  1   1           5
## cg07523188   1   1   1  1   1           5
## cg11187460   1   1   1  1   1           5
## cg14527649   1   1   1  1   1           5
## cg20370184   1   1   1  1   1           5
## cg17429539   1   1   1  1   1           5
## cg20507276   1   1   1  1   1           5
## cg13885788   1   1   1  1   1           5
## cg16178271   1   1   1  1   1           5
## cg10738648   1   1   1  1   1           5
## cg26069044   1   1   1  1   1           5
## cg25879395   1   1   1  1   1           5
## cg06112204   1   1   1  1   1           5
## cg23161429   1   1   1  1   1           5
## cg25436480   1   1   1  1   1           5
## cg26757229   1   1   1  1   1           5
## cg02932958   1   1   1  1   1           5
## cg18339359   1   1   1  1   1           5
## cg23916408   1   1   1  1   1           5
## cg06950937   1   1   1  1   1           5
## cg12784167   1   1   1  1   1           5
## cg07480176   1   1   1  1   1           5
## cg15865722   1   1   1  1   1           5
## cg27577781   1   1   1  1   1           5
## cg05321907   1   1   1  1   1           5
## cg03660162   1   1   1  1   1           5
## cg07138269   1   1   1  1   1           5
## cg20139683   1   1   1  1   1           5
## cg12284872   1   1   1  1   1           5
## cg03327352   1   1   1  1   1           5
## cg23658987   1   1   1  1   1           5
## cg21854924   1   1   1  1   1           5
## cg21697769   1   1   1  1   1           5
## cg19512141   1   1   1  1   1           5
## cg08198851   1   1   1  1   1           5
## cg00675157   1   1   1  1   1           5
## cg01153376   1   1   1  1   1           5
## cg01933473   1   1   1  1   1           5
## cg12776173   1   1   1  1   1           5
## cg14564293   1   1   1  1   1           5
## cg24851651   1   1   1  1   1           5
## cg22274273   1   1   1  1   1           5
## cg25561557   1   1   1  1   1           5
## cg21209485   1   1   1  1   1           5
## cg10985055   1   1   1  1   1           5
## cg14293999   1   1   1  1   1           5
## cg18819889   1   1   1  1   1           5
## cg24506579   1   1   1  1   1           5
## cg19377607   1   1   1  1   1           5
## cg06697310   1   1   1  1   1           5
## cg00696044   1   1   1  1   1           5
## cg01549082   1   1   1  1   1           5
## cg01128042   1   1   1  1   1           5
## cg00999469   1   1   1  1   1           5
## cg06118351   1   1   1  1   1           5
## cg12012426   1   1   1  1   1           5
## cg08584917   1   1   1  1   1           5
## cg27272246   1   1   1  1   1           5
## cg15633912   1   1   1  1   1           5
## cg16788319   1   1   1  1   1           5
## cg17906851   1   1   1  1   1           5
## cg07028768   1   1   1  1   1           5
## cg27086157   1   1   1  1   1           5
## cg14240646   1   1   1  1   1           5
## cg00154902   1   1   1  1   1           5
## cg14307563   1   1   1  1   1           5
## cg02320265   1   1   1  1   1           5
## cg08779649   1   1   1  1   1           5
## cg04664583   1   1   1  1   1           5
## cg12466610   1   1   1  1   1           5
## cg27639199   1   1   1  1   1           5
## cg15501526   1   1   1  1   1           5
## cg00689685   1   1   1  1   1           5
## cg01413796   1   1   1  1   1           5
## cg11247378   1   1   1  1   1           5
## age.now      1   1   1  1   1           5
Combine with the importance data frame
all_out_features <- union(combined_importance_freq_ordered_df$Feature, rownames(df_feature_Output_frequency))
# please note that the combined we use is the one before filtering
# Combine then based on common feature selection method
# if the feature in previous importance feature is not here, then we add the feature and make the value to zero.
feature_output_df_full <- data.frame(Feature = all_out_features)
feature_output_df_full <- merge(feature_output_df_full, df_feature_Output_frequency, by.x = "Feature", by.y = "row.names", all.x = TRUE)
feature_output_df_full[is.na(feature_output_df_full)] <- 0


# For top_impAvg_ordered
all_output_impAvg_ordered_full <- data.frame(Feature = all_out_features)
all_output_impAvg_ordered_full <- merge(combined_importance_freq_ordered_df,
                                        all_output_impAvg_ordered_full, 
                                        by.x = "Feature", 
                                        by.y = "Feature", 
                                        all.x = TRUE)
all_output_impAvg_ordered_full[is.na(all_output_impAvg_ordered_full)] <- 0
all_Output_combined_df_impAvg <- merge(feature_output_df_full, 
                                all_output_impAvg_ordered_full, 
                                by = "Feature", 
                                all = TRUE)

print(head(feature_output_df_full))
##      Feature LRM XGB ENM RF SVM Total_Count
## 1    age.now   1   1   1  1   1           5
## 2 cg00154902   1   1   1  1   1           5
## 3 cg00247094   1   1   1  1   1           5
## 4 cg00272795   1   1   1  1   1           5
## 5 cg00322003   1   1   1  1   1           5
## 6 cg00616572   1   1   1  1   1           5
print(head(all_output_impAvg_ordered_full))
##      Feature Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM
## 1    age.now      0.00000000      1.0000000       0.0000000    0.45372158      0.8333333
## 2 cg00154902      0.08879263      0.2688349       0.3713159    0.33502754      0.5833333
## 3 cg00247094      0.41278095      0.2185408       0.4245031    0.23013585      0.5833333
## 4 cg00272795      0.21295491      0.1985510       0.2309999    0.09024509      0.3333333
## 5 cg00322003      0.21752832      0.1465702       0.3430531    0.27821774      0.5833333
## 6 cg00616572      0.28381319      0.1715595       0.3572845    0.17891065      0.6666667
##   Average_Importance
## 1          0.4574110
## 2          0.3294609
## 3          0.3738588
## 4          0.2132168
## 5          0.3137405
## 6          0.3316469
print(head(all_Output_combined_df_impAvg))
##      Feature LRM XGB ENM RF SVM Total_Count Importance_LRM1 Importance_XGB Importance_ENM1
## 1    age.now   1   1   1  1   1           5      0.00000000      1.0000000       0.0000000
## 2 cg00154902   1   1   1  1   1           5      0.08879263      0.2688349       0.3713159
## 3 cg00247094   1   1   1  1   1           5      0.41278095      0.2185408       0.4245031
## 4 cg00272795   1   1   1  1   1           5      0.21295491      0.1985510       0.2309999
## 5 cg00322003   1   1   1  1   1           5      0.21752832      0.1465702       0.3430531
## 6 cg00616572   1   1   1  1   1           5      0.28381319      0.1715595       0.3572845
##   Importance_RF Importance_SVM Average_Importance
## 1    0.45372158      0.8333333          0.4574110
## 2    0.33502754      0.5833333          0.3294609
## 3    0.23013585      0.5833333          0.3738588
## 4    0.09024509      0.3333333          0.2132168
## 5    0.27821774      0.5833333          0.3137405
## 6    0.17891065      0.6666667          0.3316469
Frequency Feature Selection

choose the mutual importance feature when it exist at least half number of model’s (i.e 3 in our case) top selected number of important features list.

if(METHOD_FEATURE_FLAG == 6){
df_process_frequency_FeatureName <- rownames(df_feature_Output_frequency[df_feature_Output_frequency$Total_Count>=3,])

df_process_Output_freq<-processed_data_m6_df[,c("DX",df_process_frequency_FeatureName)]

output_Frequency_Feature <- processed_data_m6[,c("DX",df_process_frequency_FeatureName)]

print(head(output_Frequency_Feature))

print(paste("The number of final used features of common importance method:", length(df_process_frequency_FeatureName) ))

print(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
}
if(METHOD_FEATURE_FLAG == 5){
df_process_frequency_FeatureName <- rownames(df_feature_Output_frequency[df_feature_Output_frequency$Total_Count>=3,])

df_process_Output_freq<-processed_data_m5_df[,c("DX",df_process_frequency_FeatureName)]

output_Frequency_Feature <- processed_data_m5[,c("DX",df_process_frequency_FeatureName)]

print(head(output_Frequency_Feature))

print(paste("The number of final used features of common importance method:", length(df_process_frequency_FeatureName) ))

print(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
}
if(METHOD_FEATURE_FLAG == 4){
df_process_frequency_FeatureName <- rownames(df_feature_Output_frequency[df_feature_Output_frequency$Total_Count>=3,])

df_process_Output_freq<-processed_data_m4_df[,c("DX",df_process_frequency_FeatureName)]

output_Frequency_Feature <- processed_data_m4[,c("DX",df_process_frequency_FeatureName)]

print(head(output_Frequency_Feature))

print(paste("The number of final used features of common importance method:", length(df_process_frequency_FeatureName) ))

print(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
}
if(METHOD_FEATURE_FLAG==3){
df_process_frequency_FeatureName <- rownames(df_feature_Output_frequency[df_feature_Output_frequency$Total_Count>=3,])

df_process_Output_freq<-processed_data_m3_df[,c("DX",df_process_frequency_FeatureName)]

output_Frequency_Feature <- processed_data_m3[,c("DX",df_process_frequency_FeatureName)]

print(head(output_Frequency_Feature))

print(paste("The number of final used features of common importance method:", length(df_process_frequency_FeatureName) ))

print(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
}
if(METHOD_FEATURE_FLAG==1){
df_process_frequency_FeatureName <- rownames(df_feature_Output_frequency[df_feature_Output_frequency$Total_Count>=3,])

df_process_Output_freq<-processed_data_m1_df[,c("DX",df_process_frequency_FeatureName)]

output_Frequency_Feature <- processed_data_m1[,c("DX",df_process_frequency_FeatureName)]

print(head(output_Frequency_Feature))

print(paste("The number of final used features of common importance method:", length(df_process_frequency_FeatureName) ))

print(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
}
## # A tibble: 6 × 156
##   DX            PC1        PC2      PC3 cg00962106 cg02225060 cg14710850 cg27452255 cg02981548
##   <fct>       <dbl>      <dbl>    <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
## 1 MCI      -0.214    0.0147    -0.0140       0.912      0.683      0.805      0.900      0.134
## 2 CN       -0.173    0.0575     0.00506      0.538      0.827      0.809      0.659      0.522
## 3 CN       -0.00367  0.0837     0.0291       0.504      0.521      0.829      0.901      0.510
## 4 Dementia -0.187   -0.0112    -0.0323       0.904      0.808      0.834      0.890      0.566
## 5 MCI       0.0268   0.0000165  0.0529       0.896      0.608      0.850      0.578      0.568
## 6 CN       -0.0379   0.0157    -0.00869      0.886      0.764      0.821      0.881      0.508
## # ℹ 147 more variables: cg08861434 <dbl>, cg19503462 <dbl>, cg07152869 <dbl>, cg16749614 <dbl>,
## #   cg05096415 <dbl>, cg23432430 <dbl>, cg17186592 <dbl>, cg00247094 <dbl>, cg09584650 <dbl>,
## #   cg11133939 <dbl>, cg16715186 <dbl>, cg03129555 <dbl>, cg08857872 <dbl>, cg06864789 <dbl>,
## #   cg14924512 <dbl>, cg16652920 <dbl>, cg03084184 <dbl>, cg26219488 <dbl>, cg20913114 <dbl>,
## #   cg06378561 <dbl>, cg26948066 <dbl>, cg25259265 <dbl>, cg06536614 <dbl>, cg24859648 <dbl>,
## #   cg12279734 <dbl>, cg03982462 <dbl>, cg05841700 <dbl>, cg11227702 <dbl>, cg12146221 <dbl>,
## #   cg02621446 <dbl>, cg00616572 <dbl>, cg15535896 <dbl>, cg02372404 <dbl>, cg09854620 <dbl>, …
## [1] "The number of final used features of common importance method: 155"
##   [1] "PC1"        "PC2"        "PC3"        "cg00962106" "cg02225060" "cg14710850" "cg27452255"
##   [8] "cg02981548" "cg08861434" "cg19503462" "cg07152869" "cg16749614" "cg05096415" "cg23432430"
##  [15] "cg17186592" "cg00247094" "cg09584650" "cg11133939" "cg16715186" "cg03129555" "cg08857872"
##  [22] "cg06864789" "cg14924512" "cg16652920" "cg03084184" "cg26219488" "cg20913114" "cg06378561"
##  [29] "cg26948066" "cg25259265" "cg06536614" "cg24859648" "cg12279734" "cg03982462" "cg05841700"
##  [36] "cg11227702" "cg12146221" "cg02621446" "cg00616572" "cg15535896" "cg02372404" "cg09854620"
##  [43] "cg04248279" "cg20678988" "cg24861747" "cg10240127" "cg16771215" "cg01667144" "cg13080267"
##  [50] "cg02494911" "cg10750306" "cg11438323" "cg06715136" "cg04412904" "cg12738248" "cg03071582"
##  [57] "cg05570109" "cg15775217" "cg24873924" "cg17738613" "cg01921484" "cg10369879" "cg27341708"
##  [64] "cg12534577" "cg18821122" "cg12682323" "cg05234269" "cg20685672" "cg12228670" "cg11331837"
##  [71] "cg01680303" "cg17421046" "cg03088219" "cg02356645" "cg00322003" "cg01013522" "cg00272795"
##  [78] "cg25758034" "cg26474732" "cg16579946" "cg07523188" "cg11187460" "cg14527649" "cg20370184"
##  [85] "cg17429539" "cg20507276" "cg13885788" "cg16178271" "cg10738648" "cg26069044" "cg25879395"
##  [92] "cg06112204" "cg23161429" "cg25436480" "cg26757229" "cg02932958" "cg18339359" "cg23916408"
##  [99] "cg06950937" "cg12784167" "cg07480176" "cg15865722" "cg27577781" "cg05321907" "cg03660162"
## [106] "cg07138269" "cg20139683" "cg12284872" "cg03327352" "cg23658987" "cg21854924" "cg21697769"
## [113] "cg19512141" "cg08198851" "cg00675157" "cg01153376" "cg01933473" "cg12776173" "cg14564293"
## [120] "cg24851651" "cg22274273" "cg25561557" "cg21209485" "cg10985055" "cg14293999" "cg18819889"
## [127] "cg24506579" "cg19377607" "cg06697310" "cg00696044" "cg01549082" "cg01128042" "cg00999469"
## [134] "cg06118351" "cg12012426" "cg08584917" "cg27272246" "cg15633912" "cg16788319" "cg17906851"
## [141] "cg07028768" "cg27086157" "cg14240646" "cg00154902" "cg14307563" "cg02320265" "cg08779649"
## [148] "cg04664583" "cg12466610" "cg27639199" "cg15501526" "cg00689685" "cg01413796" "cg11247378"
## [155] "age.now"   
##                           DX          PC1           PC2          PC3 cg00962106 cg02225060
## 200223270003_R02C01      MCI -0.214185447  1.470293e-02 -0.014043316  0.9124898  0.6828159
## 200223270003_R03C01       CN -0.172761185  5.745834e-02  0.005055871  0.5375751  0.8265195
## 200223270003_R06C01       CN -0.003667305  8.372861e-02  0.029143653  0.5040948  0.5209552
## 200223270003_R07C01 Dementia -0.186779607 -1.117250e-02 -0.032302430  0.9039029  0.8078889
## 200223270006_R01C01      MCI  0.026814649  1.650735e-05  0.052947950  0.8961556  0.6084903
## 200223270006_R04C01       CN -0.037862929  1.571950e-02 -0.008685676  0.8857597  0.7638781
##                     cg14710850 cg27452255 cg02981548 cg08861434 cg19503462 cg07152869
## 200223270003_R02C01  0.8048592  0.9001010  0.1342571  0.8768306  0.7951675  0.8284151
## 200223270003_R03C01  0.8090950  0.6593379  0.5220037  0.4352647  0.4537684  0.5050630
## 200223270003_R06C01  0.8285902  0.9012217  0.5098965  0.8698813  0.6997359  0.8352490
## 200223270003_R07C01  0.8336457  0.8898635  0.5660985  0.4709249  0.7189778  0.5194300
## 200223270006_R01C01  0.8500725  0.5779792  0.5678714  0.8618532  0.7301755  0.5025709
## 200223270006_R04C01  0.8207247  0.8809143  0.5079859  0.9058965  0.4207207  0.8080916
##                     cg16749614 cg05096415 cg23432430 cg17186592 cg00247094 cg09584650
## 200223270003_R02C01  0.8678741  0.9182527  0.9482702  0.9230463  0.5399349 0.08230254
## 200223270003_R03C01  0.8539348  0.5177819  0.9455418  0.8593448  0.9315640 0.09661586
## 200223270003_R06C01  0.5874127  0.6288426  0.9418716  0.8467599  0.5177874 0.52399749
## 200223270003_R07C01  0.5555391  0.6060271  0.9426559  0.4986373  0.5377765 0.11587211
## 200223270006_R01C01  0.8026346  0.5599588  0.9461736  0.8978999  0.9109309 0.42115185
## 200223270006_R04C01  0.7903978  0.5441200  0.9508404  0.9239750  0.5266535 0.56043178
##                     cg11133939 cg16715186 cg03129555 cg08857872 cg06864789 cg14924512
## 200223270003_R02C01  0.1282694  0.2742789  0.6079616  0.3395280 0.05369415  0.5303907
## 200223270003_R03C01  0.5920898  0.7946153  0.5785498  0.8181845 0.46053125  0.9160885
## 200223270003_R06C01  0.5127706  0.8124316  0.9137818  0.2970779 0.87513655  0.9088414
## 200223270003_R07C01  0.8474176  0.7773263  0.9043041  0.2954090 0.49020327  0.9081681
## 200223270006_R01C01  0.8589133  0.8334531  0.9286357  0.8935876 0.47852685  0.9111789
## 200223270006_R04C01  0.5246557  0.8039945  0.9088564  0.8901338 0.05423587  0.5331753
##                     cg16652920 cg03084184 cg26219488 cg20913114 cg06378561 cg26948066
## 200223270003_R02C01  0.9436000  0.8162981  0.9336638 0.36510482  0.9389306  0.4685225
## 200223270003_R03C01  0.9431222  0.7877128  0.9134707 0.80382984  0.9377503  0.5026045
## 200223270003_R06C01  0.9457161  0.4546397  0.9261878 0.03158439  0.5154019  0.9101976
## 200223270003_R07C01  0.9419785  0.7812413  0.9217866 0.81256840  0.9403569  0.9379543
## 200223270006_R01C01  0.9529417  0.7818230  0.4929692 0.81502059  0.4956816  0.9120181
## 200223270006_R04C01  0.9492648  0.7725853  0.9431574 0.90468830  0.9268832  0.8868608
##                     cg25259265 cg06536614 cg24859648 cg12279734 cg03982462 cg05841700
## 200223270003_R02C01  0.4356646  0.5824474 0.83777536  0.6435368  0.8562777  0.2923544
## 200223270003_R03C01  0.8893591  0.5746694 0.44392797  0.1494651  0.6023731  0.9146488
## 200223270003_R06C01  0.4201700  0.5773468 0.03341185  0.8760759  0.8778458  0.3737990
## 200223270003_R07C01  0.4455517  0.5848917 0.43582347  0.8674214  0.8860227  0.5046468
## 200223270006_R01C01  0.8423337  0.5669919 0.03087161  0.6454450  0.8703107  0.8419031
## 200223270006_R04C01  0.8460736  0.5718514 0.02588024  0.8660058  0.8792860  0.9286652
##                     cg11227702 cg12146221 cg02621446 cg00616572 cg15535896 cg02372404
## 200223270003_R02C01 0.86486075  0.2049284  0.8731313  0.9335067  0.3382952 0.03598249
## 200223270003_R03C01 0.49184121  0.1814927  0.8095534  0.9214079  0.9253926 0.02767285
## 200223270003_R06C01 0.02543724  0.8619250  0.7511582  0.9113633  0.3320191 0.03127855
## 200223270003_R07C01 0.45150971  0.1238469  0.8773609  0.9160238  0.9409104 0.55685785
## 200223270006_R01C01 0.89086877  0.2021598  0.2046541  0.4861334  0.9326027 0.02587736
## 200223270006_R04C01 0.87675947  0.1383786  0.7963817  0.9067928  0.9156401 0.02828648
##                     cg09854620 cg04248279 cg20678988 cg24861747 cg10240127 cg16771215
## 200223270003_R02C01  0.5220587  0.8534976  0.8438718  0.3540897  0.9250553 0.88389723
## 200223270003_R03C01  0.8739646  0.8458854  0.8548886  0.4309505  0.9403255 0.07196933
## 200223270003_R06C01  0.8973149  0.8332786  0.7786685  0.8071462  0.9056974 0.09949974
## 200223270003_R07C01  0.8958863  0.3303204  0.8260541  0.3347317  0.9396217 0.64234023
## 200223270006_R01C01  0.9075331  0.5966878  0.3295384  0.3544795  0.9262370 0.62679274
## 200223270006_R04C01  0.9318820  0.8939599  0.8541667  0.5997840  0.9240497 0.06970175
##                     cg01667144 cg13080267 cg02494911 cg10750306 cg11438323 cg06715136
## 200223270003_R02C01  0.8971484 0.78936656  0.3049435 0.04919915  0.4863471  0.3400192
## 200223270003_R03C01  0.3175389 0.78371483  0.2416332 0.55160081  0.8984559  0.9259109
## 200223270003_R06C01  0.9238364 0.09436069  0.2520909 0.54694332  0.8722772  0.9079807
## 200223270003_R07C01  0.8739442 0.09351259  0.2457032 0.59824543  0.5026756  0.6782105
## 200223270006_R01C01  0.2931961 0.45173796  0.8045030 0.53158639  0.8809646  0.8369052
## 200223270006_R04C01  0.8616530 0.49866715  0.7489283 0.05646838  0.8717937  0.8807568
##                     cg04412904 cg12738248 cg03071582 cg05570109 cg15775217 cg24873924
## 200223270003_R02C01 0.05088595 0.85430866  0.9187811  0.3466611  0.5707441  0.3060635
## 200223270003_R03C01 0.07717659 0.88010292  0.5844421  0.5866750  0.9168327  0.8640985
## 200223270003_R06C01 0.08253743 0.51121855  0.6245558  0.4046471  0.6042521  0.8259149
## 200223270003_R07C01 0.06217431 0.09131476  0.9283683  0.6014355  0.9062231  0.8333940
## 200223270006_R01C01 0.11888769 0.91529345  0.5715416  0.5774881  0.9083515  0.8761177
## 200223270006_R04C01 0.08885846 0.91911405  0.6534650  0.8756826  0.6383270  0.8585363
##                     cg17738613 cg01921484 cg10369879 cg27341708 cg12534577 cg18821122
## 200223270003_R02C01  0.6879612 0.90985496  0.9218784 0.48846610  0.8585231  0.9291309
## 200223270003_R03C01  0.6582258 0.90931369  0.3149306 0.02613847  0.8493466  0.5901603
## 200223270003_R06C01  0.1022257 0.92044873  0.9141081 0.86893582  0.8395241  0.5779620
## 200223270003_R07C01  0.8960156 0.91674311  0.9054415 0.02642300  0.8511384  0.9251431
## 200223270006_R01C01  0.8850702 0.02943747  0.2917862 0.47573455  0.8804655  0.9217018
## 200223270006_R04C01  0.8481916 0.89057041  0.9200403 0.89411974  0.3029013  0.5412250
##                     cg12682323 cg05234269 cg20685672 cg12228670 cg11331837 cg01680303
## 200223270003_R02C01  0.9397956 0.93848584 0.67121006  0.8632174 0.03692842  0.5095174
## 200223270003_R03C01  0.9003940 0.57461229 0.79320906  0.8496212 0.57150125  0.1344941
## 200223270003_R06C01  0.9157877 0.02467208 0.66136456  0.8738949 0.03182862  0.7573869
## 200223270003_R07C01  0.9048877 0.56516794 0.80838304  0.8362189 0.03832164  0.4772204
## 200223270006_R01C01  0.1065347 0.94829529 0.08291414  0.8079694 0.93008298  0.1176263
## 200223270006_R04C01  0.8836232 0.56298286 0.84460055  0.6966666 0.54004452  0.5133033
##                     cg17421046  cg03088219 cg02356645 cg00322003 cg01013522 cg00272795
## 200223270003_R02C01  0.9026993 0.844002862  0.5105903  0.1759911  0.6251168 0.46365138
## 200223270003_R03C01  0.9112100 0.007435243  0.5833923  0.5702070  0.8862821 0.82839260
## 200223270003_R06C01  0.8952031 0.120155222  0.5701428  0.3077122  0.5425308 0.07231279
## 200223270003_R07C01  0.9268852 0.826554308  0.5683381  0.6104341  0.8429862 0.78303831
## 200223270006_R01C01  0.1118337 0.066294915  0.5233692  0.6147419  0.0480531 0.78219952
## 200223270006_R04C01  0.4174370 0.574738383  0.9188670  0.2293759  0.8240222 0.44408249
##                     cg25758034 cg26474732 cg16579946 cg07523188 cg11187460 cg14527649
## 200223270003_R02C01  0.6114028  0.7843252  0.6306315  0.7509183 0.03672179  0.2678912
## 200223270003_R03C01  0.6649219  0.8184088  0.6648766  0.1524386 0.92516409  0.7954683
## 200223270003_R06C01  0.2393844  0.7358417  0.6455081  0.7127592 0.03109553  0.8350610
## 200223270003_R07C01  0.7071501  0.7509296  0.8979650  0.8464983 0.53283119  0.8428684
## 200223270006_R01C01  0.2301078  0.8294938  0.6886498  0.7847738 0.54038146  0.8231348
## 200223270006_R04C01  0.6891513  0.8033167  0.6766907  0.8231277 0.91096169  0.8022444
##                     cg20370184 cg17429539 cg20507276 cg13885788 cg16178271 cg10738648
## 200223270003_R02C01 0.37710950  0.7860900 0.12238910  0.9380618  0.6445416 0.44931577
## 200223270003_R03C01 0.05737964  0.7100923 0.38721972  0.9369476  0.6178075 0.49894016
## 200223270003_R06C01 0.04740505  0.7660838 0.47978438  0.5163017  0.6641952 0.05552024
## 200223270003_R07C01 0.83572095  0.6984969 0.02261996  0.9183376  0.7148058 0.03730440
## 200223270006_R01C01 0.04056608  0.6508597 0.37465798  0.5525542  0.6138954 0.54952781
## 200223270006_R04C01 0.04038589  0.2828452 0.03570795  0.9328289  0.9414188 0.59358167
##                     cg26069044 cg25879395 cg06112204 cg23161429 cg25436480 cg26757229
## 200223270003_R02C01 0.92401867 0.88130864  0.5251592  0.8956965 0.84251599  0.6723726
## 200223270003_R03C01 0.94072227 0.02603438  0.8773488  0.9099619 0.49940321  0.1422661
## 200223270003_R06C01 0.93321315 0.91060615  0.8867975  0.8833895 0.34943119  0.7933794
## 200223270003_R07C01 0.56567694 0.89205942  0.5613799  0.9134709 0.85244913  0.8074830
## 200223270006_R01C01 0.94369927 0.47886249  0.9184122  0.8738558 0.44545117  0.5265692
## 200223270006_R04C01 0.02040391 0.02145248  0.9152514  0.9104210 0.02575036  0.7341953
##                     cg02932958 cg18339359 cg23916408 cg06950937 cg12784167 cg07480176
## 200223270003_R02C01  0.7901008  0.8824858  0.1942275  0.8910968 0.81503498  0.5171664
## 200223270003_R03C01  0.4210489  0.9040272  0.9154993  0.2889345 0.02811410  0.3760452
## 200223270003_R06C01  0.3825995  0.8552121  0.8886255  0.9143801 0.03073269  0.6998389
## 200223270003_R07C01  0.7617081  0.3073106  0.8872447  0.8891079 0.84775699  0.2189042
## 200223270006_R01C01  0.8431126  0.8973742  0.2219945  0.8868617 0.83825789  0.5570021
## 200223270006_R04C01  0.7610084  0.2292800  0.1520624  0.9093273 0.45475291  0.4501196
##                     cg15865722 cg27577781 cg05321907 cg03660162 cg07138269 cg20139683
## 200223270003_R02C01 0.89438595  0.8143535  0.2880477  0.8691767  0.5002290  0.8717075
## 200223270003_R03C01 0.90194372  0.8113185  0.1782629  0.5160770  0.9426707  0.9059433
## 200223270003_R06C01 0.92118977  0.8144274  0.8427929  0.9026304  0.5057781  0.8962554
## 200223270003_R07C01 0.09230759  0.7970617  0.8320504  0.5305691  0.9400527  0.9218012
## 200223270006_R01C01 0.93422668  0.8640044  0.2422218  0.9257451  0.9321602  0.1708472
## 200223270006_R04C01 0.92220002  0.8840237  0.2429551  0.8935772  0.9333501  0.1067122
##                     cg12284872 cg03327352 cg23658987 cg21854924 cg21697769 cg19512141
## 200223270003_R02C01  0.8008333  0.8851712 0.79757644  0.8729132  0.8946108  0.8209161
## 200223270003_R03C01  0.7414569  0.8786878 0.07511718  0.7162342  0.2822953  0.7903543
## 200223270003_R06C01  0.7725267  0.3042310 0.10177571  0.7520990  0.8698740  0.8404684
## 200223270003_R07C01  0.7573369  0.8273211 0.46747992  0.8641284  0.9134887  0.2202759
## 200223270006_R01C01  0.7201607  0.8774082 0.76831297  0.6498895  0.2683820  0.8059589
## 200223270006_R04C01  0.8021446  0.8829492 0.08988532  0.5943113  0.2765740  0.7020247
##                     cg08198851 cg00675157 cg01153376 cg01933473 cg12776173 cg14564293
## 200223270003_R02C01  0.6578905  0.9188438  0.4872148  0.2589014 0.10388038 0.52089591
## 200223270003_R03C01  0.6578186  0.9242325  0.9639670  0.6726133 0.87306345 0.04000662
## 200223270003_R06C01  0.1272153  0.9254708  0.2242410  0.2642560 0.70094907 0.04959460
## 200223270003_R07C01  0.8351465  0.5447244  0.5155654  0.1978068 0.11367159 0.03114773
## 200223270006_R01C01  0.8791156  0.5173554  0.9588916  0.7599441 0.09458405 0.51703196
## 200223270006_R04C01  0.1423737  0.9247232  0.9586876  0.7405661 0.86532175 0.51535010
##                     cg24851651 cg22274273 cg25561557 cg21209485 cg10985055 cg14293999
## 200223270003_R02C01 0.03674702  0.4209386 0.76736369  0.8865053  0.8518169  0.2836710
## 200223270003_R03C01 0.05358297  0.4246379 0.03851635  0.8714878  0.8631895  0.9172023
## 200223270003_R06C01 0.05968923  0.4196796 0.47259480  0.2292550  0.5456633  0.9168166
## 200223270003_R07C01 0.60864179  0.4164100 0.43364249  0.2351526  0.8825100  0.9188336
## 200223270006_R01C01 0.08825834  0.7951105 0.46211439  0.8882046  0.8841690  0.1971116
## 200223270006_R04C01 0.91932068  0.0229810 0.44651530  0.2292483  0.8407797  0.9030919
##                     cg18819889 cg24506579 cg19377607 cg06697310 cg00696044 cg01549082
## 200223270003_R02C01  0.9156157  0.5244337 0.05377464  0.8454609 0.55608424  0.2924138
## 200223270003_R03C01  0.9004455  0.5794845 0.90570746  0.8653044 0.07552381  0.7065693
## 200223270003_R06C01  0.9054439  0.9427785 0.06636174  0.2405168 0.79270858  0.2895440
## 200223270003_R07C01  0.9089935  0.9323844 0.68788639  0.8479193 0.03548419  0.6422955
## 200223270006_R01C01  0.9065397  0.9185355 0.06338988  0.8206613 0.10714386  0.8471236
## 200223270006_R04C01  0.9242767  0.4332642 0.91551446  0.7839595 0.18420803  0.6949888
##                     cg01128042 cg00999469 cg06118351 cg12012426 cg08584917 cg27272246
## 200223270003_R02C01  0.9113420  0.3274080 0.36339400  0.9165048  0.5663205  0.8615873
## 200223270003_R03C01  0.5328806  0.2857719 0.47148604  0.9434768  0.9019732  0.8705287
## 200223270003_R06C01  0.5222757  0.2499229 0.86559618  0.9220044  0.9187789  0.8103777
## 200223270003_R07C01  0.5141721  0.2819622 0.83494303  0.9241284  0.6007449  0.0310881
## 200223270006_R01C01  0.9321215  0.2933539 0.02632111  0.9327143  0.9069098  0.7686536
## 200223270006_R04C01  0.5050081  0.2966623 0.83329300  0.9271167  0.9263584  0.4403542
##                     cg15633912 cg16788319 cg17906851 cg07028768 cg27086157 cg14240646
## 200223270003_R02C01  0.1605530  0.9379870  0.9488392  0.4496851  0.9224112  0.5391334
## 200223270003_R03C01  0.9333421  0.8913429  0.9529718  0.8536078  0.9219304  0.2538363
## 200223270003_R06C01  0.8737362  0.8680680  0.6462151  0.8356936  0.3224986  0.1864902
## 200223270003_R07C01  0.9137334  0.8811444  0.9553497  0.4245893  0.3455486  0.6402007
## 200223270006_R01C01  0.9169706  0.3123481  0.6222117  0.8835151  0.8988962  0.7696079
## 200223270006_R04C01  0.8890004  0.2995627  0.6441202  0.4514661  0.9159217  0.1490028
##                     cg00154902 cg14307563 cg02320265 cg08779649 cg04664583 cg12466610
## 200223270003_R02C01  0.5137741  0.1855966  0.8853213 0.44449401  0.5572814 0.05767659
## 200223270003_R03C01  0.8540746  0.8916957  0.4686314 0.45076825  0.5881190 0.59131778
## 200223270003_R06C01  0.8188126  0.8750052  0.4838749 0.04810217  0.9352717 0.06939623
## 200223270003_R07C01  0.4625776  0.8975663  0.8986848 0.42715969  0.9350230 0.04527733
## 200223270006_R01C01  0.4690086  0.8762842  0.8987560 0.89313476  0.9424588 0.05212904
## 200223270006_R04C01  0.4547219  0.9168614  0.4768520 0.59523771  0.9379537 0.05104033
##                     cg27639199 cg15501526 cg00689685 cg01413796 cg11247378  age.now
## 200223270003_R02C01 0.67515415  0.6362531  0.7019389  0.1345128  0.1591185 82.40000
## 200223270003_R03C01 0.67552763  0.6319253  0.8634268  0.2830672  0.7874849 78.60000
## 200223270003_R06C01 0.06233093  0.7435100  0.6378795  0.8194681  0.4807942 80.40000
## 200223270003_R07C01 0.05701332  0.7756577  0.8624541  0.9007710  0.4537348 78.16441
## 200223270006_R01C01 0.05037694  0.3230777  0.6361891  0.2603027  0.1537079 62.90000
## 200223270006_R04C01 0.08144161  0.8342695  0.6356260  0.9207672  0.1686356 80.67796
print(df_process_frequency_FeatureName)
##   [1] "PC1"        "PC2"        "PC3"        "cg00962106" "cg02225060" "cg14710850" "cg27452255"
##   [8] "cg02981548" "cg08861434" "cg19503462" "cg07152869" "cg16749614" "cg05096415" "cg23432430"
##  [15] "cg17186592" "cg00247094" "cg09584650" "cg11133939" "cg16715186" "cg03129555" "cg08857872"
##  [22] "cg06864789" "cg14924512" "cg16652920" "cg03084184" "cg26219488" "cg20913114" "cg06378561"
##  [29] "cg26948066" "cg25259265" "cg06536614" "cg24859648" "cg12279734" "cg03982462" "cg05841700"
##  [36] "cg11227702" "cg12146221" "cg02621446" "cg00616572" "cg15535896" "cg02372404" "cg09854620"
##  [43] "cg04248279" "cg20678988" "cg24861747" "cg10240127" "cg16771215" "cg01667144" "cg13080267"
##  [50] "cg02494911" "cg10750306" "cg11438323" "cg06715136" "cg04412904" "cg12738248" "cg03071582"
##  [57] "cg05570109" "cg15775217" "cg24873924" "cg17738613" "cg01921484" "cg10369879" "cg27341708"
##  [64] "cg12534577" "cg18821122" "cg12682323" "cg05234269" "cg20685672" "cg12228670" "cg11331837"
##  [71] "cg01680303" "cg17421046" "cg03088219" "cg02356645" "cg00322003" "cg01013522" "cg00272795"
##  [78] "cg25758034" "cg26474732" "cg16579946" "cg07523188" "cg11187460" "cg14527649" "cg20370184"
##  [85] "cg17429539" "cg20507276" "cg13885788" "cg16178271" "cg10738648" "cg26069044" "cg25879395"
##  [92] "cg06112204" "cg23161429" "cg25436480" "cg26757229" "cg02932958" "cg18339359" "cg23916408"
##  [99] "cg06950937" "cg12784167" "cg07480176" "cg15865722" "cg27577781" "cg05321907" "cg03660162"
## [106] "cg07138269" "cg20139683" "cg12284872" "cg03327352" "cg23658987" "cg21854924" "cg21697769"
## [113] "cg19512141" "cg08198851" "cg00675157" "cg01153376" "cg01933473" "cg12776173" "cg14564293"
## [120] "cg24851651" "cg22274273" "cg25561557" "cg21209485" "cg10985055" "cg14293999" "cg18819889"
## [127] "cg24506579" "cg19377607" "cg06697310" "cg00696044" "cg01549082" "cg01128042" "cg00999469"
## [134] "cg06118351" "cg12012426" "cg08584917" "cg27272246" "cg15633912" "cg16788319" "cg17906851"
## [141] "cg07028768" "cg27086157" "cg14240646" "cg00154902" "cg14307563" "cg02320265" "cg08779649"
## [148] "cg04664583" "cg12466610" "cg27639199" "cg15501526" "cg00689685" "cg01413796" "cg11247378"
## [155] "age.now"
Importance of these features:
Selected_Frequency_Feature_importance <-all_Output_combined_df_impAvg[all_Output_combined_df_impAvg$Total_Count>=3,]
print(Selected_Frequency_Feature_importance)
##       Feature LRM XGB ENM RF SVM Total_Count Importance_LRM1 Importance_XGB Importance_ENM1
## 1     age.now   1   1   1  1   1           5      0.00000000    1.000000000     0.000000000
## 2  cg00154902   1   1   1  1   1           5      0.08879263    0.268834896     0.371315930
## 3  cg00247094   1   1   1  1   1           5      0.41278095    0.218540778     0.424503057
## 4  cg00272795   1   1   1  1   1           5      0.21295491    0.198550953     0.230999937
## 5  cg00322003   1   1   1  1   1           5      0.21752832    0.146570216     0.343053068
## 6  cg00616572   1   1   1  1   1           5      0.28381319    0.171559535     0.357284511
## 7  cg00675157   1   1   1  1   1           5      0.14546770    0.091681870     0.329939993
## 8  cg00689685   1   1   1  1   1           5      0.04219397    0.101123505     0.210886561
## 9  cg00696044   1   1   1  1   1           5      0.12999136    0.173039136     0.281254843
## 10 cg00962106   1   1   1  1   1           5      0.62818798    0.527922660     0.727596599
## 11 cg00999469   1   1   1  1   1           5      0.11924999    0.172508732     0.193113515
## 12 cg01013522   1   1   1  1   1           5      0.21592960    0.241888711     0.316918954
## 13 cg01128042   1   1   1  1   1           5      0.12518582    0.423396901     0.284997767
## 14 cg01153376   1   1   1  1   1           5      0.14471902    0.331692564     0.329009655
## 15 cg01413796   1   1   1  1   1           5      0.02161027    0.256050130     0.105363369
## 16 cg01549082   1   1   1  1   1           5      0.12522345    0.157926568     0.009399384
## 17 cg01667144   1   1   1  1   1           5      0.26504058    0.258542574     0.297490033
## 18 cg01680303   1   1   1  1   1           5      0.22195009    0.143689906     0.298057362
## 19 cg01921484   1   1   1  1   1           5      0.23359943    0.448455125     0.384660546
## 20 cg01933473   1   1   1  1   1           5      0.14405977    0.138623149     0.165864840
## 21 cg02225060   1   1   1  1   1           5      0.50844099    0.189218518     0.617365165
## 22 cg02320265   1   1   1  1   1           5      0.07922128    0.200640351     0.176092180
## 23 cg02356645   1   1   1  1   1           5      0.21753719    0.174918028     0.301971691
## 24 cg02372404   1   1   1  1   1           5      0.27764868    0.159713319     0.450667096
## 25 cg02494911   1   1   1  1   1           5      0.26111760    0.365368878     0.332216396
## 26 cg02621446   1   1   1  1   1           5      0.28474428    0.413508159     0.346649379
## 27 cg02932958   1   1   1  1   1           5      0.18341414    0.030277794     0.271954786
## 28 cg02981548   1   1   1  1   1           5      0.48692573    0.409443001     0.586910968
## 29 cg03071582   1   1   1  1   1           5      0.23936745    0.097173746     0.269629005
## 30 cg03084184   1   1   1  1   1           5      0.34207903    0.177639114     0.391669668
## 31 cg03088219   1   1   1  1   1           5      0.21783076    0.266905948     0.252647943
## 32 cg03129555   1   1   1  1   1           5      0.38181198    0.262127746     0.341877381
## 33 cg03327352   1   1   1  1   1           5      0.15990027    0.379266748     0.308022027
## 34 cg03660162   1   1   1  1   1           5      0.16307948    0.075703296     0.368700827
## 35 cg03982462   1   1   1  1   1           5      0.30254946    0.041317912     0.442784617
## 36 cg04248279   1   1   1  1   1           5      0.27148602    0.351871408     0.328078798
## 37 cg04412904   1   1   1  1   1           5      0.24639764    0.482166922     0.341662549
## 38 cg04664583   1   1   1  1   1           5      0.07389089    0.025123400     0.197066667
## 39 cg05096415   1   1   1  1   1           5      0.44533457    0.586739835     0.414828632
## 40 cg05234269   1   1   1  1   1           5      0.22837191    0.172234686     0.338291448
## 41 cg05321907   1   1   1  1   1           5      0.16624900    0.066630222     0.227854962
## 42 cg05570109   1   1   1  1   1           5      0.23797548    0.144787573     0.385926710
## 43 cg05841700   1   1   1  1   1           5      0.30170848    0.155412051     0.349467184
## 44 cg06112204   1   1   1  1   1           5      0.19129682    0.020074093     0.239077055
## 45 cg06118351   1   1   1  1   1           5      0.11824394    0.057558412     0.263145036
## 46 cg06378561   1   1   1  1   1           5      0.33046434    0.254873358     0.319057217
## 47 cg06536614   1   1   1  1   1           5      0.32793798    0.202932339     0.433426126
## 48 cg06697310   1   1   1  1   1           5      0.13063945    0.237623033     0.326843827
## 49 cg06715136   1   1   1  1   1           5      0.24950260    0.191920615     0.349420903
## 50 cg06864789   1   1   1  1   1           5      0.36409272    0.503835929     0.460586631
## 51 cg06950937   1   1   1  1   1           5      0.18053327    0.277466641     0.235188883
## 52 cg07028768   1   1   1  1   1           5      0.10714823    0.116925946     0.398410973
## 53 cg07138269   1   1   1  1   1           5      0.16208302    0.092407396     0.307628197
## 54 cg07152869   1   1   1  1   1           5      0.46401454    0.212363413     0.539218307
## 55 cg07480176   1   1   1  1   1           5      0.17577373    0.009422593     0.271014784
## 56 cg07523188   1   1   1  1   1           5      0.20694262    0.080073043     0.293838589
## 57 cg08198851   1   1   1  1   1           5      0.14937223    0.289014103     0.281424533
## 58 cg08584917   1   1   1  1   1           5      0.11176227    0.138616247     0.299594690
## 59 cg08779649   1   1   1  1   1           5      0.07612126    0.161307390     0.166351371
## 60 cg08857872   1   1   1  1   1           5      0.38088074    0.467593325     0.531139947
## 61 cg08861434   1   1   1  1   1           5      0.48302924    0.235876875     0.493132245
## 62 cg09584650   1   1   1  1   1           5      0.41042125    0.457112879     0.477067244
## 63 cg09854620   1   1   1  1   1           5      0.27298526    0.183209797     0.344318733
## 64 cg10240127   1   1   1  1   1           5      0.27027622    0.213287988     0.425197502
## 65 cg10369879   1   1   1  1   1           5      0.23210094    0.183817567     0.316828724
## 66 cg10738648   1   1   1  1   1           5      0.19474268    0.224614703     0.270553023
## 67 cg10750306   1   1   1  1   1           5      0.25985394    0.086335295     0.291945624
## 68 cg10985055   1   1   1  1   1           5      0.13743727    0.102700187     0.162868541
## 69 cg11133939   1   1   1  1   1           5      0.40082643    0.201244345     0.473727841
## 70 cg11187460   1   1   1  1   1           5      0.20693049    0.145437291     0.183459129
## 71 cg11227702   1   1   1  1   1           5      0.29375910    0.049153501     0.304770879
## 72 cg11247378   1   1   1  1   1           5      0.01493278    0.195085308     0.258188066
## 73 cg11331837   1   1   1  1   1           5      0.22214980    0.219715757     0.276642306
## 74 cg11438323   1   1   1  1   1           5      0.24970672    0.127152455     0.289005707
## 75 cg12012426   1   1   1  1   1           5      0.11249753    0.317869115     0.228603140
## 76 cg12146221   1   1   1  1   1           5      0.28574803    0.361724220     0.305210343
##    Importance_RF Importance_SVM Average_Importance
## 1     0.45372158      0.8333333          0.4574110
## 2     0.33502754      0.5833333          0.3294609
## 3     0.23013585      0.5833333          0.3738588
## 4     0.09024509      0.3333333          0.2132168
## 5     0.27821774      0.5833333          0.3137405
## 6     0.17891065      0.6666667          0.3316469
## 7     0.16641199      0.5000000          0.2467003
## 8     0.16712976      0.4166667          0.1876001
## 9     0.14117479      0.4166667          0.2284254
## 10    0.46613808      0.2500000          0.5199691
## 11    0.26372499      0.7500000          0.2997194
## 12    0.25694826      0.6666667          0.3396704
## 13    0.22999345      0.6666667          0.3460481
## 14    0.72836889      0.2500000          0.3567580
## 15    0.08143128      0.1666667          0.1262243
## 16    0.25808487      0.0000000          0.1101269
## 17    0.24489722      0.4166667          0.2965274
## 18    0.24811633      0.6666667          0.3156961
## 19    0.41690692      0.7500000          0.4467244
## 20    0.15765991      0.3333333          0.1879082
## 21    0.23091637      0.4166667          0.3925215
## 22    0.23921737      0.4166667          0.2223676
## 23    0.05315135      0.6666667          0.2828490
## 24    0.20775241      0.5833333          0.3358230
## 25    0.39679341      0.5000000          0.3710993
## 26    0.33321305      0.5000000          0.3756230
## 27    0.08617858      0.5833333          0.2310317
## 28    0.22577927      0.4166667          0.4251451
## 29    0.15550558      0.4166667          0.2356685
## 30    0.31009075      0.3333333          0.3109624
## 31    0.15279655      0.4166667          0.2613696
## 32    0.11859695      0.5833333          0.3375495
## 33    0.23579212      0.4166667          0.2999296
## 34    0.11594685      0.5833333          0.2613528
## 35    0.21066649      0.4166667          0.2827970
## 36    0.12491500      0.3333333          0.2819369
## 37    0.28390081      0.4166667          0.3541589
## 38    0.33357822      0.3333333          0.1925985
## 39    0.28495711      0.7500000          0.4963720
## 40    0.26421598      0.6666667          0.3339561
## 41    0.26338041      0.5000000          0.2448229
## 42    0.38455253      0.5000000          0.3306485
## 43    0.01805247      0.6666667          0.2982614
## 44    0.14283075      0.5000000          0.2186557
## 45    0.29018940      0.4166667          0.2291607
## 46    0.11176792      0.5833333          0.3198992
## 47    0.04798536      0.5000000          0.3024564
## 48    0.24776591      0.5833333          0.3052411
## 49    0.20299016      0.5833333          0.3154335
## 50    0.46960918      0.5000000          0.4596249
## 51    0.18893246      0.3333333          0.2430909
## 52    0.21571670      0.4166667          0.2509737
## 53    0.06844617      0.4166667          0.2094463
## 54    0.17108416      0.6666667          0.4106694
## 55    0.16116737      0.5833333          0.2401424
## 56    0.34960936      0.5000000          0.2860927
## 57    0.16855829      0.5000000          0.2776738
## 58    0.08221573      0.3333333          0.1931045
## 59    0.13947791      0.5000000          0.2086516
## 60    0.56010360      0.4166667          0.4712769
## 61    0.14267663      0.6666667          0.4042763
## 62    0.23050169      0.5833333          0.4316873
## 63    0.32387790      0.5000000          0.3248783
## 64    0.31291846      0.5000000          0.3443360
## 65    0.29168017      0.5000000          0.3048855
## 66    0.32817697      0.5000000          0.3036175
## 67    0.07977978      0.5833333          0.2602496
## 68    0.26635858      0.3333333          0.2005396
## 69    0.34044653      0.5000000          0.3832490
## 70    0.22891961      0.4166667          0.2362826
## 71    0.04105989      0.4166667          0.2210820
## 72    0.15726047      0.5833333          0.2417600
## 73    0.33393787      0.3333333          0.2771558
## 74    0.03589151      0.5833333          0.2570179
## 75    0.07484149      0.4166667          0.2300956
## 76    0.21590054      0.5000000          0.3337166
##  [ reached 'max' / getOption("max.print") -- omitted 79 rows ]

8.2 Output - Write Files

Data Frame with selected features

# Output data frame with selected features based on mean method:  
# "selected_impAvg_ordered_NAME", This data frame don't have column named "SampleID"

if(Flag_8mean){

filename_mean <- paste0("Selected_mean", "_", INPUT_NUMBER_FEATURES, "_Features.csv")
OUTPUTPATH_mean <- paste0(OUTUT_CSV_PATHNAME, filename_mean)
  if (file.exists(OUTPUTPATH_mean)) {
      print("selected file based on frequency already exists")} 
  else {
      write.csv(df_selected_Mean, 
            file = OUTPUTPATH_mean, 
            row.names = FALSE)
  }



}
if(Flag_8median){
  filename_median <- paste0("Selected_median", "_", INPUT_NUMBER_FEATURES, "_Features.csv")
  OUTPUTPATH_median <- paste0(OUTUT_CSV_PATHNAME, filename_median)

  if (file.exists(OUTPUTPATH_median)) {
      print("selected file based on frequency already exists")} 
  else {
      write.csv(df_selected_Median, 
            file = OUTPUTPATH_median, 
            row.names = FALSE)
  }
}
if(Flag_8Fequency){
   filename_frequency <- paste0("Selected_frequency", "_", INPUT_NUMBER_FEATURES, "_Features.csv")
  OUTPUTPATH_frequency <- paste0(OUTUT_CSV_PATHNAME, filename_frequency)
    if (file.exists(OUTPUTPATH_frequency)) {
      print("selected file based on frequency already exists")} 
    else {
      write.csv(df_process_Output_freq, 
                file = OUTPUTPATH_frequency, 
                row.names = FALSE)
  }
}

Phenotype Data Frame

# This is the flag of phenotype data output, 
# if set to TRUE then output the file, will check if there exist the file in the given path, if not then write the file, if there exist the file then not return.
# if set to FLASE then not output the phenotype file.
# NOTICE THAT : the phenotype file is selected from "Merged_df".

phenotypeDF<-merged_df_raw[,colnames(phenoticPart_RAW)]
print(head(phenotypeDF))
##                                barcodes RID.a     prop.B    prop.NK   prop.CD4T  prop.CD8T
## 200223270003_R02C01 200223270003_R02C01  2190 0.03164651 0.03609239 0.010771839 0.01481567
## 200223270003_R03C01 200223270003_R03C01  4080 0.03556363 0.04697771 0.002321312 0.06381941
## 200223270003_R06C01 200223270003_R06C01  4505 0.07129589 0.04412218 0.037684081 0.11457236
## 200223270003_R07C01 200223270003_R07C01  1010 0.02081699 0.07117668 0.040966085 0.00000000
## 200223270006_R01C01 200223270006_R01C01  4226 0.02680465 0.04767947 0.128514873 0.09085886
## 200223270006_R04C01 200223270006_R04C01  1190 0.07063013 0.05250647 0.064529118 0.04309168
##                      prop.Mono prop.Neutro prop.Eosino       DX  age.now PTGENDER  ABETA   TAU
## 200223270003_R02C01 0.06533409   0.8413395           0      MCI 82.40000     Male  963.2 341.5
## 200223270003_R03C01 0.04901806   0.8022999           0       CN 78.60000   Female  950.6 295.9
## 200223270003_R06C01 0.08745402   0.6448715           0       CN 80.40000   Female 1705.0 353.2
## 200223270003_R07C01 0.04459325   0.8224470           0 Dementia 78.16441     Male  493.3 272.8
## 200223270006_R01C01 0.07419209   0.6319501           0      MCI 62.90000   Female 1705.0 253.1
## 200223270006_R04C01 0.08796080   0.6812818           0       CN 80.67796   Female 1336.0 439.3
##                      PTAU          PC1           PC2          PC3   ageGroup ageGroupsq DX_num
## 200223270003_R02C01 35.48 -0.214185447  1.470293e-02 -0.014043316  0.6606949 0.43651772      0
## 200223270003_R03C01 28.08 -0.172761185  5.745834e-02  0.005055871  0.2806949 0.07878961      0
## 200223270003_R06C01 28.49 -0.003667305  8.372861e-02  0.029143653  0.4606949 0.21223977      0
## 200223270003_R07C01 22.75 -0.186779607 -1.117250e-02 -0.032302430  0.2371357 0.05623333      1
## 200223270006_R01C01 22.84  0.026814649  1.650735e-05  0.052947950 -1.2893051 1.66230770      0
## 200223270006_R04C01 40.78 -0.037862929  1.571950e-02 -0.008685676  0.4884909 0.23862336      0
##                     uniqueID  Horvath
## 200223270003_R02C01        1 61.50365
## 200223270003_R03C01        1 69.26678
## 200223270003_R06C01        1 96.84418
## 200223270003_R07C01        1 61.76446
## 200223270006_R01C01        1 59.33885
## 200223270006_R04C01        1 70.27197
OUTPUTPATH_phenotypePart <- paste0(OUTUT_CSV_PATHNAME, "PhenotypePart_df.csv")

if(phenoOutPUt_FLAG ){
  if (file.exists(OUTPUTPATH_phenotypePart)) {
  print("Phenotype File already exists")} 
  else {
  write.csv(phenotypeDF, file = OUTPUTPATH_phenotypePart, row.names = FALSE)
  }
}
## [1] "Phenotype File already exists"

9. Selected Feature Performance

9.1 Selected Based on Mean

9.1.1 Input Feature For Evaluation

Performance of the selected output features based on Mean

processed_dataFrame<-df_selected_Mean
processed_data<-output_mean_process

AfterProcess_FeatureName<-selected_impAvg_ordered_NAME
print(head(output_mean_process))
## # A tibble: 6 × 156
##   DX            PC1 cg00962106        PC2 cg05096415 cg08857872 cg23432430 cg16652920 cg06864789
##   <fct>       <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
## 1 MCI      -0.214        0.912  0.0147         0.918      0.340      0.948      0.944     0.0537
## 2 CN       -0.173        0.538  0.0575         0.518      0.818      0.946      0.943     0.461 
## 3 CN       -0.00367      0.504  0.0837         0.629      0.297      0.942      0.946     0.875 
## 4 Dementia -0.187        0.904 -0.0112         0.606      0.295      0.943      0.942     0.490 
## 5 MCI       0.0268       0.896  0.0000165      0.560      0.894      0.946      0.953     0.479 
## 6 CN       -0.0379       0.886  0.0157         0.544      0.890      0.951      0.949     0.0542
## # ℹ 147 more variables: age.now <dbl>, cg01921484 <dbl>, cg26948066 <dbl>, cg17186592 <dbl>,
## #   cg09584650 <dbl>, cg12279734 <dbl>, cg02981548 <dbl>, cg14710850 <dbl>, PC3 <dbl>,
## #   cg07152869 <dbl>, cg08861434 <dbl>, cg15501526 <dbl>, cg25259265 <dbl>, cg02225060 <dbl>,
## #   cg24859648 <dbl>, cg11133939 <dbl>, cg25879395 <dbl>, cg02621446 <dbl>, cg00247094 <dbl>,
## #   cg02494911 <dbl>, cg16771215 <dbl>, cg24861747 <dbl>, cg01153376 <dbl>, cg04412904 <dbl>,
## #   cg20913114 <dbl>, cg01128042 <dbl>, cg10240127 <dbl>, cg14564293 <dbl>, cg16749614 <dbl>,
## #   cg01013522 <dbl>, cg16579946 <dbl>, cg03129555 <dbl>, cg02372404 <dbl>, cg05234269 <dbl>, …
print(selected_impAvg_ordered_NAME)
##   [1] "PC1"        "cg00962106" "PC2"        "cg05096415" "cg08857872" "cg23432430" "cg16652920"
##   [8] "cg06864789" "age.now"    "cg01921484" "cg26948066" "cg17186592" "cg09584650" "cg12279734"
##  [15] "cg02981548" "cg14710850" "PC3"        "cg07152869" "cg08861434" "cg15501526" "cg25259265"
##  [22] "cg02225060" "cg24859648" "cg11133939" "cg25879395" "cg02621446" "cg00247094" "cg02494911"
##  [29] "cg16771215" "cg24861747" "cg01153376" "cg04412904" "cg20913114" "cg01128042" "cg10240127"
##  [36] "cg14564293" "cg16749614" "cg01013522" "cg16579946" "cg03129555" "cg02372404" "cg05234269"
##  [43] "cg12146221" "cg12228670" "cg14924512" "cg27452255" "cg16715186" "cg00616572" "cg05570109"
##  [50] "cg00154902" "cg14293999" "cg17421046" "cg15775217" "cg09854620" "cg19503462" "cg26757229"
##  [57] "cg06378561" "cg01680303" "cg06715136" "cg15535896" "cg00322003" "cg27341708" "cg03084184"
##  [64] "cg26219488" "cg18339359" "cg06697310" "cg10369879" "cg10738648" "cg06536614" "cg26069044"
##  [71] "cg20685672" "cg03327352" "cg00999469" "cg23658987" "cg05841700" "cg01667144" "cg15865722"
##  [78] "cg13885788" "cg14527649" "cg23161429" "cg20370184" "cg18821122" "cg07523188" "cg12534577"
##  [85] "cg02356645" "cg03982462" "cg04248279" "cg13080267" "cg27639199" "cg08198851" "cg11331837"
##  [92] "cg24873924" "cg20507276" "cg25561557" "cg22274273" "cg12682323" "cg17738613" "cg21209485"
##  [99] "cg03088219" "cg03660162" "cg10750306" "cg27272246" "cg11438323" "cg12738248" "cg21854924"
## [106] "cg20139683" "cg16178271" "cg07028768" "cg26474732" "cg00675157" "cg23916408" "cg05321907"
## [113] "cg17429539" "cg06950937" "cg14240646" "cg27086157" "cg25758034" "cg11247378" "cg19377607"
## [120] "cg07480176" "cg27577781" "cg11187460" "cg03071582" "cg12284872" "cg02932958" "cg12012426"
## [127] "cg06118351" "cg00696044" "cg25436480" "cg02320265" "cg11227702" "cg18819889" "cg06112204"
## [134] "cg19512141" "cg24506579" "cg00272795" "cg21697769" "cg12776173" "cg07138269" "cg17906851"
## [141] "cg08779649" "cg10985055" "cg08584917" "cg04664583" "cg01933473" "cg00689685" "cg14307563"
## [148] "cg12784167" "cg24851651" "cg15633912" "cg12466610" "cg16788319" "cg20678988" "cg01413796"
## [155] "cg01549082"
print(head(df_selected_Mean))
##                           DX          PC1 cg00962106           PC2 cg05096415 cg08857872
## 200223270003_R02C01      MCI -0.214185447  0.9124898  1.470293e-02  0.9182527  0.3395280
## 200223270003_R03C01       CN -0.172761185  0.5375751  5.745834e-02  0.5177819  0.8181845
## 200223270003_R06C01       CN -0.003667305  0.5040948  8.372861e-02  0.6288426  0.2970779
## 200223270003_R07C01 Dementia -0.186779607  0.9039029 -1.117250e-02  0.6060271  0.2954090
## 200223270006_R01C01      MCI  0.026814649  0.8961556  1.650735e-05  0.5599588  0.8935876
## 200223270006_R04C01       CN -0.037862929  0.8857597  1.571950e-02  0.5441200  0.8901338
##                     cg23432430 cg16652920 cg06864789  age.now cg01921484 cg26948066 cg17186592
## 200223270003_R02C01  0.9482702  0.9436000 0.05369415 82.40000 0.90985496  0.4685225  0.9230463
## 200223270003_R03C01  0.9455418  0.9431222 0.46053125 78.60000 0.90931369  0.5026045  0.8593448
## 200223270003_R06C01  0.9418716  0.9457161 0.87513655 80.40000 0.92044873  0.9101976  0.8467599
## 200223270003_R07C01  0.9426559  0.9419785 0.49020327 78.16441 0.91674311  0.9379543  0.4986373
## 200223270006_R01C01  0.9461736  0.9529417 0.47852685 62.90000 0.02943747  0.9120181  0.8978999
## 200223270006_R04C01  0.9508404  0.9492648 0.05423587 80.67796 0.89057041  0.8868608  0.9239750
##                     cg09584650 cg12279734 cg02981548 cg14710850          PC3 cg07152869
## 200223270003_R02C01 0.08230254  0.6435368  0.1342571  0.8048592 -0.014043316  0.8284151
## 200223270003_R03C01 0.09661586  0.1494651  0.5220037  0.8090950  0.005055871  0.5050630
## 200223270003_R06C01 0.52399749  0.8760759  0.5098965  0.8285902  0.029143653  0.8352490
## 200223270003_R07C01 0.11587211  0.8674214  0.5660985  0.8336457 -0.032302430  0.5194300
## 200223270006_R01C01 0.42115185  0.6454450  0.5678714  0.8500725  0.052947950  0.5025709
## 200223270006_R04C01 0.56043178  0.8660058  0.5079859  0.8207247 -0.008685676  0.8080916
##                     cg08861434 cg15501526 cg25259265 cg02225060 cg24859648 cg11133939
## 200223270003_R02C01  0.8768306  0.6362531  0.4356646  0.6828159 0.83777536  0.1282694
## 200223270003_R03C01  0.4352647  0.6319253  0.8893591  0.8265195 0.44392797  0.5920898
## 200223270003_R06C01  0.8698813  0.7435100  0.4201700  0.5209552 0.03341185  0.5127706
## 200223270003_R07C01  0.4709249  0.7756577  0.4455517  0.8078889 0.43582347  0.8474176
## 200223270006_R01C01  0.8618532  0.3230777  0.8423337  0.6084903 0.03087161  0.8589133
## 200223270006_R04C01  0.9058965  0.8342695  0.8460736  0.7638781 0.02588024  0.5246557
##                     cg25879395 cg02621446 cg00247094 cg02494911 cg16771215 cg24861747
## 200223270003_R02C01 0.88130864  0.8731313  0.5399349  0.3049435 0.88389723  0.3540897
## 200223270003_R03C01 0.02603438  0.8095534  0.9315640  0.2416332 0.07196933  0.4309505
## 200223270003_R06C01 0.91060615  0.7511582  0.5177874  0.2520909 0.09949974  0.8071462
## 200223270003_R07C01 0.89205942  0.8773609  0.5377765  0.2457032 0.64234023  0.3347317
## 200223270006_R01C01 0.47886249  0.2046541  0.9109309  0.8045030 0.62679274  0.3544795
## 200223270006_R04C01 0.02145248  0.7963817  0.5266535  0.7489283 0.06970175  0.5997840
##                     cg01153376 cg04412904 cg20913114 cg01128042 cg10240127 cg14564293
## 200223270003_R02C01  0.4872148 0.05088595 0.36510482  0.9113420  0.9250553 0.52089591
## 200223270003_R03C01  0.9639670 0.07717659 0.80382984  0.5328806  0.9403255 0.04000662
## 200223270003_R06C01  0.2242410 0.08253743 0.03158439  0.5222757  0.9056974 0.04959460
## 200223270003_R07C01  0.5155654 0.06217431 0.81256840  0.5141721  0.9396217 0.03114773
## 200223270006_R01C01  0.9588916 0.11888769 0.81502059  0.9321215  0.9262370 0.51703196
## 200223270006_R04C01  0.9586876 0.08885846 0.90468830  0.5050081  0.9240497 0.51535010
##                     cg16749614 cg01013522 cg16579946 cg03129555 cg02372404 cg05234269
## 200223270003_R02C01  0.8678741  0.6251168  0.6306315  0.6079616 0.03598249 0.93848584
## 200223270003_R03C01  0.8539348  0.8862821  0.6648766  0.5785498 0.02767285 0.57461229
## 200223270003_R06C01  0.5874127  0.5425308  0.6455081  0.9137818 0.03127855 0.02467208
## 200223270003_R07C01  0.5555391  0.8429862  0.8979650  0.9043041 0.55685785 0.56516794
## 200223270006_R01C01  0.8026346  0.0480531  0.6886498  0.9286357 0.02587736 0.94829529
## 200223270006_R04C01  0.7903978  0.8240222  0.6766907  0.9088564 0.02828648 0.56298286
##                     cg12146221 cg12228670 cg14924512 cg27452255 cg16715186 cg00616572
## 200223270003_R02C01  0.2049284  0.8632174  0.5303907  0.9001010  0.2742789  0.9335067
## 200223270003_R03C01  0.1814927  0.8496212  0.9160885  0.6593379  0.7946153  0.9214079
## 200223270003_R06C01  0.8619250  0.8738949  0.9088414  0.9012217  0.8124316  0.9113633
## 200223270003_R07C01  0.1238469  0.8362189  0.9081681  0.8898635  0.7773263  0.9160238
## 200223270006_R01C01  0.2021598  0.8079694  0.9111789  0.5779792  0.8334531  0.4861334
## 200223270006_R04C01  0.1383786  0.6966666  0.5331753  0.8809143  0.8039945  0.9067928
##                     cg05570109 cg00154902 cg14293999 cg17421046 cg15775217 cg09854620
## 200223270003_R02C01  0.3466611  0.5137741  0.2836710  0.9026993  0.5707441  0.5220587
## 200223270003_R03C01  0.5866750  0.8540746  0.9172023  0.9112100  0.9168327  0.8739646
## 200223270003_R06C01  0.4046471  0.8188126  0.9168166  0.8952031  0.6042521  0.8973149
## 200223270003_R07C01  0.6014355  0.4625776  0.9188336  0.9268852  0.9062231  0.8958863
## 200223270006_R01C01  0.5774881  0.4690086  0.1971116  0.1118337  0.9083515  0.9075331
## 200223270006_R04C01  0.8756826  0.4547219  0.9030919  0.4174370  0.6383270  0.9318820
##                     cg19503462 cg26757229 cg06378561 cg01680303 cg06715136 cg15535896
## 200223270003_R02C01  0.7951675  0.6723726  0.9389306  0.5095174  0.3400192  0.3382952
## 200223270003_R03C01  0.4537684  0.1422661  0.9377503  0.1344941  0.9259109  0.9253926
## 200223270003_R06C01  0.6997359  0.7933794  0.5154019  0.7573869  0.9079807  0.3320191
## 200223270003_R07C01  0.7189778  0.8074830  0.9403569  0.4772204  0.6782105  0.9409104
## 200223270006_R01C01  0.7301755  0.5265692  0.4956816  0.1176263  0.8369052  0.9326027
## 200223270006_R04C01  0.4207207  0.7341953  0.9268832  0.5133033  0.8807568  0.9156401
##                     cg00322003 cg27341708 cg03084184 cg26219488 cg18339359 cg06697310
## 200223270003_R02C01  0.1759911 0.48846610  0.8162981  0.9336638  0.8824858  0.8454609
## 200223270003_R03C01  0.5702070 0.02613847  0.7877128  0.9134707  0.9040272  0.8653044
## 200223270003_R06C01  0.3077122 0.86893582  0.4546397  0.9261878  0.8552121  0.2405168
## 200223270003_R07C01  0.6104341 0.02642300  0.7812413  0.9217866  0.3073106  0.8479193
## 200223270006_R01C01  0.6147419 0.47573455  0.7818230  0.4929692  0.8973742  0.8206613
## 200223270006_R04C01  0.2293759 0.89411974  0.7725853  0.9431574  0.2292800  0.7839595
##                     cg10369879 cg10738648 cg06536614 cg26069044 cg20685672 cg03327352
## 200223270003_R02C01  0.9218784 0.44931577  0.5824474 0.92401867 0.67121006  0.8851712
## 200223270003_R03C01  0.3149306 0.49894016  0.5746694 0.94072227 0.79320906  0.8786878
## 200223270003_R06C01  0.9141081 0.05552024  0.5773468 0.93321315 0.66136456  0.3042310
## 200223270003_R07C01  0.9054415 0.03730440  0.5848917 0.56567694 0.80838304  0.8273211
## 200223270006_R01C01  0.2917862 0.54952781  0.5669919 0.94369927 0.08291414  0.8774082
## 200223270006_R04C01  0.9200403 0.59358167  0.5718514 0.02040391 0.84460055  0.8829492
##                     cg00999469 cg23658987 cg05841700 cg01667144 cg15865722 cg13885788
## 200223270003_R02C01  0.3274080 0.79757644  0.2923544  0.8971484 0.89438595  0.9380618
## 200223270003_R03C01  0.2857719 0.07511718  0.9146488  0.3175389 0.90194372  0.9369476
## 200223270003_R06C01  0.2499229 0.10177571  0.3737990  0.9238364 0.92118977  0.5163017
## 200223270003_R07C01  0.2819622 0.46747992  0.5046468  0.8739442 0.09230759  0.9183376
## 200223270006_R01C01  0.2933539 0.76831297  0.8419031  0.2931961 0.93422668  0.5525542
## 200223270006_R04C01  0.2966623 0.08988532  0.9286652  0.8616530 0.92220002  0.9328289
##                     cg14527649 cg23161429 cg20370184 cg18821122 cg07523188 cg12534577
## 200223270003_R02C01  0.2678912  0.8956965 0.37710950  0.9291309  0.7509183  0.8585231
## 200223270003_R03C01  0.7954683  0.9099619 0.05737964  0.5901603  0.1524386  0.8493466
## 200223270003_R06C01  0.8350610  0.8833895 0.04740505  0.5779620  0.7127592  0.8395241
## 200223270003_R07C01  0.8428684  0.9134709 0.83572095  0.9251431  0.8464983  0.8511384
## 200223270006_R01C01  0.8231348  0.8738558 0.04056608  0.9217018  0.7847738  0.8804655
## 200223270006_R04C01  0.8022444  0.9104210 0.04038589  0.5412250  0.8231277  0.3029013
##                     cg02356645 cg03982462 cg04248279 cg13080267 cg27639199 cg08198851
## 200223270003_R02C01  0.5105903  0.8562777  0.8534976 0.78936656 0.67515415  0.6578905
## 200223270003_R03C01  0.5833923  0.6023731  0.8458854 0.78371483 0.67552763  0.6578186
## 200223270003_R06C01  0.5701428  0.8778458  0.8332786 0.09436069 0.06233093  0.1272153
## 200223270003_R07C01  0.5683381  0.8860227  0.3303204 0.09351259 0.05701332  0.8351465
## 200223270006_R01C01  0.5233692  0.8703107  0.5966878 0.45173796 0.05037694  0.8791156
## 200223270006_R04C01  0.9188670  0.8792860  0.8939599 0.49866715 0.08144161  0.1423737
##                     cg11331837 cg24873924 cg20507276 cg25561557 cg22274273 cg12682323
## 200223270003_R02C01 0.03692842  0.3060635 0.12238910 0.76736369  0.4209386  0.9397956
## 200223270003_R03C01 0.57150125  0.8640985 0.38721972 0.03851635  0.4246379  0.9003940
## 200223270003_R06C01 0.03182862  0.8259149 0.47978438 0.47259480  0.4196796  0.9157877
## 200223270003_R07C01 0.03832164  0.8333940 0.02261996 0.43364249  0.4164100  0.9048877
## 200223270006_R01C01 0.93008298  0.8761177 0.37465798 0.46211439  0.7951105  0.1065347
## 200223270006_R04C01 0.54004452  0.8585363 0.03570795 0.44651530  0.0229810  0.8836232
##                     cg17738613 cg21209485  cg03088219 cg03660162 cg10750306 cg27272246
## 200223270003_R02C01  0.6879612  0.8865053 0.844002862  0.8691767 0.04919915  0.8615873
## 200223270003_R03C01  0.6582258  0.8714878 0.007435243  0.5160770 0.55160081  0.8705287
## 200223270003_R06C01  0.1022257  0.2292550 0.120155222  0.9026304 0.54694332  0.8103777
## 200223270003_R07C01  0.8960156  0.2351526 0.826554308  0.5305691 0.59824543  0.0310881
## 200223270006_R01C01  0.8850702  0.8882046 0.066294915  0.9257451 0.53158639  0.7686536
## 200223270006_R04C01  0.8481916  0.2292483 0.574738383  0.8935772 0.05646838  0.4403542
##                     cg11438323 cg12738248 cg21854924 cg20139683 cg16178271 cg07028768
## 200223270003_R02C01  0.4863471 0.85430866  0.8729132  0.8717075  0.6445416  0.4496851
## 200223270003_R03C01  0.8984559 0.88010292  0.7162342  0.9059433  0.6178075  0.8536078
## 200223270003_R06C01  0.8722772 0.51121855  0.7520990  0.8962554  0.6641952  0.8356936
## 200223270003_R07C01  0.5026756 0.09131476  0.8641284  0.9218012  0.7148058  0.4245893
## 200223270006_R01C01  0.8809646 0.91529345  0.6498895  0.1708472  0.6138954  0.8835151
## 200223270006_R04C01  0.8717937 0.91911405  0.5943113  0.1067122  0.9414188  0.4514661
##                     cg26474732 cg00675157 cg23916408 cg05321907 cg17429539 cg06950937
## 200223270003_R02C01  0.7843252  0.9188438  0.1942275  0.2880477  0.7860900  0.8910968
## 200223270003_R03C01  0.8184088  0.9242325  0.9154993  0.1782629  0.7100923  0.2889345
## 200223270003_R06C01  0.7358417  0.9254708  0.8886255  0.8427929  0.7660838  0.9143801
## 200223270003_R07C01  0.7509296  0.5447244  0.8872447  0.8320504  0.6984969  0.8891079
## 200223270006_R01C01  0.8294938  0.5173554  0.2219945  0.2422218  0.6508597  0.8868617
## 200223270006_R04C01  0.8033167  0.9247232  0.1520624  0.2429551  0.2828452  0.9093273
##                     cg14240646 cg27086157 cg25758034 cg11247378 cg19377607 cg07480176
## 200223270003_R02C01  0.5391334  0.9224112  0.6114028  0.1591185 0.05377464  0.5171664
## 200223270003_R03C01  0.2538363  0.9219304  0.6649219  0.7874849 0.90570746  0.3760452
## 200223270003_R06C01  0.1864902  0.3224986  0.2393844  0.4807942 0.06636174  0.6998389
## 200223270003_R07C01  0.6402007  0.3455486  0.7071501  0.4537348 0.68788639  0.2189042
## 200223270006_R01C01  0.7696079  0.8988962  0.2301078  0.1537079 0.06338988  0.5570021
## 200223270006_R04C01  0.1490028  0.9159217  0.6891513  0.1686356 0.91551446  0.4501196
##                     cg27577781 cg11187460 cg03071582 cg12284872 cg02932958 cg12012426
## 200223270003_R02C01  0.8143535 0.03672179  0.9187811  0.8008333  0.7901008  0.9165048
## 200223270003_R03C01  0.8113185 0.92516409  0.5844421  0.7414569  0.4210489  0.9434768
## 200223270003_R06C01  0.8144274 0.03109553  0.6245558  0.7725267  0.3825995  0.9220044
## 200223270003_R07C01  0.7970617 0.53283119  0.9283683  0.7573369  0.7617081  0.9241284
## 200223270006_R01C01  0.8640044 0.54038146  0.5715416  0.7201607  0.8431126  0.9327143
## 200223270006_R04C01  0.8840237 0.91096169  0.6534650  0.8021446  0.7610084  0.9271167
##                     cg06118351 cg00696044 cg25436480 cg02320265 cg11227702 cg18819889
## 200223270003_R02C01 0.36339400 0.55608424 0.84251599  0.8853213 0.86486075  0.9156157
## 200223270003_R03C01 0.47148604 0.07552381 0.49940321  0.4686314 0.49184121  0.9004455
## 200223270003_R06C01 0.86559618 0.79270858 0.34943119  0.4838749 0.02543724  0.9054439
## 200223270003_R07C01 0.83494303 0.03548419 0.85244913  0.8986848 0.45150971  0.9089935
## 200223270006_R01C01 0.02632111 0.10714386 0.44545117  0.8987560 0.89086877  0.9065397
## 200223270006_R04C01 0.83329300 0.18420803 0.02575036  0.4768520 0.87675947  0.9242767
##                     cg06112204 cg19512141 cg24506579 cg00272795 cg21697769 cg12776173
## 200223270003_R02C01  0.5251592  0.8209161  0.5244337 0.46365138  0.8946108 0.10388038
## 200223270003_R03C01  0.8773488  0.7903543  0.5794845 0.82839260  0.2822953 0.87306345
## 200223270003_R06C01  0.8867975  0.8404684  0.9427785 0.07231279  0.8698740 0.70094907
## 200223270003_R07C01  0.5613799  0.2202759  0.9323844 0.78303831  0.9134887 0.11367159
## 200223270006_R01C01  0.9184122  0.8059589  0.9185355 0.78219952  0.2683820 0.09458405
## 200223270006_R04C01  0.9152514  0.7020247  0.4332642 0.44408249  0.2765740 0.86532175
##                     cg07138269 cg17906851 cg08779649 cg10985055 cg08584917 cg04664583
## 200223270003_R02C01  0.5002290  0.9488392 0.44449401  0.8518169  0.5663205  0.5572814
## 200223270003_R03C01  0.9426707  0.9529718 0.45076825  0.8631895  0.9019732  0.5881190
## 200223270003_R06C01  0.5057781  0.6462151 0.04810217  0.5456633  0.9187789  0.9352717
## 200223270003_R07C01  0.9400527  0.9553497 0.42715969  0.8825100  0.6007449  0.9350230
## 200223270006_R01C01  0.9321602  0.6222117 0.89313476  0.8841690  0.9069098  0.9424588
## 200223270006_R04C01  0.9333501  0.6441202 0.59523771  0.8407797  0.9263584  0.9379537
##                     cg01933473 cg00689685 cg14307563 cg12784167 cg24851651 cg15633912
## 200223270003_R02C01  0.2589014  0.7019389  0.1855966 0.81503498 0.03674702  0.1605530
## 200223270003_R03C01  0.6726133  0.8634268  0.8916957 0.02811410 0.05358297  0.9333421
## 200223270003_R06C01  0.2642560  0.6378795  0.8750052 0.03073269 0.05968923  0.8737362
## 200223270003_R07C01  0.1978068  0.8624541  0.8975663 0.84775699 0.60864179  0.9137334
## 200223270006_R01C01  0.7599441  0.6361891  0.8762842 0.83825789 0.08825834  0.9169706
## 200223270006_R04C01  0.7405661  0.6356260  0.9168614 0.45475291 0.91932068  0.8890004
##                     cg12466610 cg16788319 cg20678988 cg01413796 cg01549082
## 200223270003_R02C01 0.05767659  0.9379870  0.8438718  0.1345128  0.2924138
## 200223270003_R03C01 0.59131778  0.8913429  0.8548886  0.2830672  0.7065693
## 200223270003_R06C01 0.06939623  0.8680680  0.7786685  0.8194681  0.2895440
## 200223270003_R07C01 0.04527733  0.8811444  0.8260541  0.9007710  0.6422955
## 200223270006_R01C01 0.05212904  0.3123481  0.3295384  0.2603027  0.8471236
## 200223270006_R04C01 0.05104033  0.2995627  0.8541667  0.9207672  0.6949888

9.1.2. Logistic Regression Model

9.1.2.1 Logistic Regression Model Training

df_LRM1<-processed_data 
featureName_LRM1<-AfterProcess_FeatureName
library(glmnet)
library(caret)
set.seed(123)  
trainIndex <- createDataPartition(df_LRM1$DX, p = 0.7, list = FALSE)
trainData <- df_LRM1[trainIndex, ]
testData <- df_LRM1[-trainIndex, ]
dim(trainData)
## [1] 455 156
dim(testData)
## [1] 193 156
ctrl <- trainControl(method = "cv", number = 5)

model_LRM1 <- caret::train(DX ~ ., data = trainData, method = "glmnet", trControl = ctrl)

predictions <- predict(model_LRM1, newdata = testData,type="raw")
cm_FeatEval_Mean_LRM1<-caret::confusionMatrix(predictions, testData$DX)

print(cm_FeatEval_Mean_LRM1)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia MCI
##   CN       46        7  14
##   Dementia  3       10   4
##   MCI      17       11  81
## 
## Overall Statistics
##                                           
##                Accuracy : 0.7098          
##                  95% CI : (0.6403, 0.7728)
##     No Information Rate : 0.513           
##     P-Value [Acc > NIR] : 2.018e-08       
##                                           
##                   Kappa : 0.4987          
##                                           
##  Mcnemar's Test P-Value : 0.1607          
## 
## Statistics by Class:
## 
##                      Class: CN Class: Dementia Class: MCI
## Sensitivity             0.6970         0.35714     0.8182
## Specificity             0.8346         0.95758     0.7021
## Pos Pred Value          0.6866         0.58824     0.7431
## Neg Pred Value          0.8413         0.89773     0.7857
## Prevalence              0.3420         0.14508     0.5130
## Detection Rate          0.2383         0.05181     0.4197
## Detection Prevalence    0.3472         0.08808     0.5648
## Balanced Accuracy       0.7658         0.65736     0.7602
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
cm_FeatEval_Mean_LRM1_Accuracy <- cm_FeatEval_Mean_LRM1$overall["Accuracy"]
cm_FeatEval_Mean_LRM1_Kappa <- cm_FeatEval_Mean_LRM1$overall["Kappa"]

print(cm_FeatEval_Mean_LRM1_Accuracy)
##  Accuracy 
## 0.7098446
print(cm_FeatEval_Mean_LRM1_Kappa)
##     Kappa 
## 0.4987013
print(model_LRM1)
## glmnet 
## 
## 455 samples
## 155 predictors
##   3 classes: 'CN', 'Dementia', 'MCI' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 364, 365, 363, 364, 364 
## Resampling results across tuning parameters:
## 
##   alpha  lambda        Accuracy   Kappa    
##   0.10   0.0001810831  0.6350263  0.3962356
##   0.10   0.0018108309  0.6460636  0.4102125
##   0.10   0.0181083090  0.6548792  0.4144240
##   0.55   0.0001810831  0.6263550  0.3765308
##   0.55   0.0018108309  0.6505792  0.4121576
##   0.55   0.0181083090  0.6483336  0.3870111
##   1.00   0.0001810831  0.6065010  0.3457739
##   1.00   0.0018108309  0.6394930  0.3907984
##   1.00   0.0181083090  0.5867925  0.2663062
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.01810831.
train_predictions <- predict(model_LRM1, newdata = trainData, type = "raw")

train_accuracy <- mean(train_predictions == trainData$DX)

FeatEval_Mean_LRM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  0.96043956043956"
print(FeatEval_Mean_LRM1_trainAccuracy)
## [1] 0.9604396
mean_accuracy_model_LRM1 <- mean(model_LRM1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM1)
## [1] 0.6326693
FeatEval_Mean_mean_accuracy_cv_LRM1 <- mean_accuracy_model_LRM1
print(FeatEval_Mean_mean_accuracy_cv_LRM1)
## [1] 0.6326693
library(caret)
library(pROC)
if (METHOD_FEATURE_FLAG ==5){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX,
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_LRM1_AUC <- auc_value
  print(roc_curve)
  print("The auc value is:")
  print(auc_value)
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==4 || METHOD_FEATURE_FLAG ==6){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX,
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_LRM1_AUC <- auc_value
  print(roc_curve)
  print("The auc value is:")
  print(auc_value)
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==3){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX,
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_LRM1_AUC <- auc_value
  print(roc_curve)
  print("The auc value is:")
  print(auc_value)
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==1){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.8487
## The AUC value for class CN is: 0.8487235 
## 
## Class: Dementia 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.8312
## The AUC value for class Dementia is: 0.8311688 
## 
## Class: MCI 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.8189
## The AUC value for class MCI is: 0.818934

if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Mean_LRM1_AUC <- mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.8329421
print(FeatEval_Mean_LRM1_AUC)
## [1] 0.8329421
importance_model_LRM1 <- varImp(model_LRM1)

print(importance_model_LRM1)
## glmnet variable importance
## 
##   variables are sorted by maximum importance across the classes
##   only 20 most important variables shown (out of 155)
## 
##                CN  Dementia    MCI
## PC1        90.424 1.000e+02  0.000
## PC2        46.616 7.877e+01  0.000
## PC3         6.073 0.000e+00 68.217
## cg00962106 63.057 1.183e+01 36.936
## cg02225060 23.027 1.263e+01 51.151
## cg14710850 49.621 8.391e+00 25.398
## cg27452255 49.050 1.786e+01 11.826
## cg02981548 26.232 5.626e+00 49.013
## cg08861434 48.681 0.000e+00 42.742
## cg19503462 25.906 4.812e+01  5.791
## cg07152869 27.973 4.673e+01  1.360
## cg16749614 11.547 1.797e+01 45.945
## cg05096415  1.413 4.492e+01 28.934
## cg23432430 44.233 3.509e+00 25.256
## cg17186592  3.088 4.200e+01 26.683
## cg00247094 15.875 4.167e+01 10.436
## cg09584650 41.421 6.534e+00 18.532
## cg11133939 24.203 1.687e-03 40.480
## cg16715186 39.188 7.688e+00 17.049
## cg03129555 12.446 3.860e+01  8.425
plot(importance_model_LRM1, top = 20, main = "Variable Importance Plot")

importance_model_LRM1_df<-importance_model_LRM1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 ||METHOD_FEATURE_FLAG==6){
  
importance_final_model_LRM1 <- varImp(model_LRM1$finalModel)

library(dplyr)
ordered_importance_final_model_LRM1 <- importance_final_model_LRM1 %>% arrange(desc(Overall))

print(ordered_importance_final_model_LRM1)  
  
}
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_model_LRM1_df$Feature<-rownames(importance_model_LRM1_df)
  importance_model_LRM1_df <- importance_model_LRM1_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_model_LRM1_df)
  
}
##             CN     Dementia        MCI    Feature MaxImportance
## 1   90.4236664 1.000000e+02  0.0000000        PC1   100.0000000
## 2   46.6158230 7.876514e+01  0.0000000        PC2    78.7651390
## 3    6.0732385 0.000000e+00 68.2173286        PC3    68.2173286
## 4   63.0567895 1.183052e+01 36.9364886 cg00962106    63.0567895
## 5   23.0265832 1.262802e+01 51.1506685 cg02225060    51.1506685
## 6   49.6209826 8.390836e+00 25.3977905 cg14710850    49.6209826
## 7   49.0496838 1.785809e+01 11.8258108 cg27452255    49.0496838
## 8   26.2323147 5.625925e+00 49.0125309 cg02981548    49.0125309
## 9   48.6806848 0.000000e+00 42.7421514 cg08861434    48.6806848
## 10  25.9055990 4.811555e+01  5.7906582 cg19503462    48.1155464
## 11  27.9726027 4.672801e+01  1.3602948 cg07152869    46.7280077
## 12  11.5469865 1.796701e+01 45.9447524 cg16749614    45.9447524
## 13   1.4125525 4.491749e+01 28.9342186 cg05096415    44.9174899
## 14  44.2328180 3.508617e+00 25.2561347 cg23432430    44.2328180
## 15   3.0875872 4.199779e+01 26.6830942 cg17186592    41.9977939
## 16  15.8745094 4.166997e+01 10.4359896 cg00247094    41.6699659
## 17  41.4211230 6.534278e+00 18.5319272 cg09584650    41.4211230
## 18  24.2034820 1.687251e-03 40.4800617 cg11133939    40.4800617
## 19  39.1879982 7.688370e+00 17.0487674 cg16715186    39.1879982
## 20  12.4459376 3.860234e+01  8.4248631 cg03129555    38.6023357
## 21   3.1921180 2.009614e+01 38.4787320 cg08857872    38.4787320
## 22  12.1315784 3.682837e+01 11.1247121 cg06864789    36.8283695
## 23   0.0000000 3.530183e+01 26.7385538 cg14924512    35.3018323
## 24   7.2101094 1.187214e+01 34.9148750 cg16652920    34.9148750
## 25  19.1219402 3.459956e+01  0.0000000 cg03084184    34.5995575
## 26   3.6609079 1.335948e+01 34.1622823 cg26219488    34.1622823
## 27  13.4822639 3.380877e+01  6.0644062 cg20913114    33.8087688
## 28   7.1343871 3.346938e+01 11.8191158 cg06378561    33.4693793
## 29  33.3214329 1.548017e+01  2.1007087 cg26948066    33.3214329
## 30   0.5721452 3.328675e+01 17.4675204 cg25259265    33.2867495
## 31  33.2597536 0.000000e+00 21.5563401 cg06536614    33.2597536
## 32   1.6480480 3.232505e+01 17.2508040 cg24859648    32.3250481
## 33  12.7630119 3.077963e+01  2.2041139 cg12279734    30.7796293
## 34  30.6939618 1.116751e+01  2.4904135 cg03982462    30.6939618
## 35   1.2191910 3.061511e+01 16.6088463 cg05841700    30.6151075
## 36  29.8316998 7.646344e+00  7.7265359 cg11227702    29.8316998
## 37  25.3622898 0.000000e+00 29.0155964 cg12146221    29.0155964
## 38   9.6427110 8.951059e+00 28.9324214 cg02621446    28.9324214
## 39   0.0000000 2.259323e+01 28.8392975 cg00616572    28.8392975
## 40  28.4402938 8.977561e+00  6.5457544 cg15535896    28.4402938
## 41  25.4671036 0.000000e+00 28.2002270 cg02372404    28.2002270
## 42   5.0584162 2.777780e+01  8.1363657 cg09854620    27.7777988
## 43  27.6105358 0.000000e+00 15.8569776 cg04248279    27.6105358
## 44   3.9947457 7.707938e+00 27.5383597 cg20678988    27.5383597
## 45   0.0000000 2.752900e+01 13.8294115 cg24861747    27.5290027
## 46  27.4710197 1.566117e+01  0.0000000 cg10240127    27.4710197
## 47   7.7716675 7.237123e+00 27.2251410 cg16771215    27.2251410
## 48   0.6477691 2.697150e+01 14.6478759 cg01667144    26.9715010
## 49  26.9373979 8.943869e+00  2.8090450 cg13080267    26.9373979
## 50   0.0000000 2.615370e+01 26.5923694 cg02494911    26.5923694
## 51   9.3817536 2.645606e+01  5.1251747 cg10750306    26.4560604
## 52  25.4583653 1.204429e+00 11.2684574 cg11438323    25.4583653
## 53   4.8711575 4.022583e+00 25.4129088 cg06715136    25.4129088
## 54  25.1290464 0.000000e+00 15.3740479 cg04412904    25.1290464
## 55   4.7625575 2.483766e+01  5.3951689 cg12738248    24.8376618
## 56  24.4006278 0.000000e+00 18.6839449 cg03071582    24.4006278
## 57   0.0000000 2.429556e+01 15.8184784 cg05570109    24.2955559
## 58  24.2246662 2.027283e+01  0.0000000 cg15775217    24.2246662
## 59   0.0000000 1.993766e+01 24.1861184 cg24873924    24.1861184
## 60   7.5582296 4.154304e+00 24.1289924 cg17738613    24.1289924
## 61  23.8214092 0.000000e+00 20.8147297 cg01921484    23.8214092
## 62   0.0000000 1.632160e+01 23.6854396 cg10369879    23.6854396
## 63   0.0000000 1.840061e+01 23.6420901 cg27341708    23.6420901
## 64   0.0000000 2.355222e+01 21.4288853 cg12534577    23.5522196
## 65   0.0000000 2.343045e+01 17.8269147 cg18821122    23.4304500
## 66   4.6170471 6.921189e+00 23.3527287 cg12682323    23.3527287
## 67  23.3209910 0.000000e+00 14.1833195 cg05234269    23.3209910
## 68  23.0307834 0.000000e+00 22.7938958 cg20685672    23.0307834
## 69  20.3680497 0.000000e+00 22.8562527 cg12228670    22.8562527
## 70  22.7069964 3.660633e+00  8.3346151 cg11331837    22.7069964
## 71   0.0000000 2.268753e+01 20.8599811 cg01680303    22.6875341
## 72  22.4129176 1.160843e+00 10.2276178 cg17421046    22.4129176
## 73  22.2738923 8.042670e+00  2.2622928 cg03088219    22.2738923
## 74  22.2627889 1.928880e+01  0.0000000 cg00322003    22.2627889
## 75  22.2407874 1.530789e+01  0.0000000 cg02356645    22.2407874
## 76   5.8928437 2.207499e+01  1.2617305 cg01013522    22.0749918
## 77  12.6196526 0.000000e+00 21.8196303 cg00272795    21.8196303
## 78  21.6367031 0.000000e+00 14.5418861 cg25758034    21.6367031
## 79   4.7726068 2.162589e+01  1.1857577 cg26474732    21.6258905
## 80   0.0000000 2.126494e+01 17.6339938 cg16579946    21.2649390
## 81   9.6070487 2.121677e+01  0.0000000 cg07523188    21.2167720
## 82  21.2090861 4.531801e+00  5.6485933 cg11187460    21.2090861
## 83   0.0000000 1.703369e+01 20.8087502 cg14527649    20.8087502
## 84   2.7288778 4.858758e+00 20.5395653 cg20370184    20.5395653
## 85  20.5238610 0.000000e+00 13.7146333 cg17429539    20.5238610
## 86   0.0000000 2.027184e+01 10.0202802 cg20507276    20.2718432
## 87   1.1829922 6.819762e+00 20.1949298 cg13885788    20.1949298
## 88   0.0000000 1.557801e+01 20.0711568 cg16178271    20.0711568
## 89   5.5958884 1.533155e+00 19.9939644 cg10738648    19.9939644
## 90   5.1484910 1.991679e+01  2.7511644 cg26069044    19.9167949
## 91   3.1995623 4.951416e+00 19.7913728 cg25879395    19.7913728
## 92  19.6367721 0.000000e+00 12.1257134 cg06112204    19.6367721
## 93   3.2337436 1.923270e+01  1.2688054 cg23161429    19.2327006
## 94  19.0290833 0.000000e+00  8.8811450 cg25436480    19.0290833
## 95  18.8963290 1.895416e+01  0.0000000 cg26757229    18.9541606
## 96  18.8489892 8.146546e+00  0.0000000 cg02932958    18.8489892
## 97   6.3385640 1.862396e+01  0.9542925 cg18339359    18.6239621
## 98  18.5833313 1.513099e+00  1.8899782 cg06950937    18.5833313
## 99  12.0414352 1.857722e+01  0.0000000 cg23916408    18.5772240
## 100  1.5261549 3.184899e+00 18.1654164 cg12784167    18.1654164
## 101 11.9154282 0.000000e+00 18.1155156 cg07480176    18.1155156
## 102  0.0000000 5.496876e+00 17.6957094 cg15865722    17.6957094
## 103 17.6745178 0.000000e+00 13.0402417 cg27577781    17.6745178
## 104 17.1561047 2.943098e+00  2.5244160 cg05321907    17.1561047
## 105 16.8696278 0.000000e+00  7.5600874 cg03660162    16.8696278
## 106 16.7547601 0.000000e+00  9.9115899 cg07138269    16.7547601
## 107 16.7359257 9.081285e-04  5.4548067 cg20139683    16.7359257
## 108  1.5127234 1.661837e+01  3.6050266 cg12284872    16.6183749
## 109 16.5320336 0.000000e+00 15.3309720 cg03327352    16.5320336
## 110  0.0000000 1.652355e+01 12.9102072 cg23658987    16.5235495
## 111  0.0000000 1.474794e+01 16.1731669 cg21854924    16.1731669
## 112 15.7781397 0.000000e+00  6.8410564 cg21697769    15.7781397
## 113 15.6679755 5.754754e+00  0.0000000 cg19512141    15.6679755
## 114 10.3149355 0.000000e+00 15.4737089 cg08198851    15.4737089
## 115  0.4260012 1.508768e+01  0.8270265 cg00675157    15.0876767
## 116  0.0000000 5.691150e+00 15.0114537 cg01153376    15.0114537
## 117  1.8023617 1.495677e+01  0.7652334 cg01933473    14.9567667
## 118 14.9041545 0.000000e+00  4.5865304 cg12776173    14.9041545
## 119  0.0000000 1.067475e+01 14.7131793 cg14564293    14.7131793
## 120 12.4078661 0.000000e+00 14.5714652 cg24851651    14.5714652
## 121  0.0000000 1.452429e+01  2.2532934 cg22274273    14.5242914
## 122 12.7839527 1.451759e+01  0.0000000 cg25561557    14.5175853
## 123 13.7937627 1.439434e+01  0.0000000 cg21209485    14.3943424
## 124  3.9002055 1.430129e+01  0.0000000 cg10985055    14.3012935
## 125  8.0836178 0.000000e+00 14.2414682 cg14293999    14.2414682
## 126  0.0000000 6.083721e+00 13.9742620 cg18819889    13.9742620
## 127  7.9121604 1.390587e+01  0.0000000 cg24506579    13.9058683
## 128 10.4879315 0.000000e+00 13.8167304 cg19377607    13.8167304
## 129  2.6273452 1.361436e+01  0.0000000 cg06697310    13.6143633
## 130 13.5716123 0.000000e+00 10.1626215 cg00696044    13.5716123
## 131  0.0000000 0.000000e+00 13.1070671 cg01549082    13.1070671
## 132  0.0000000 6.885929e+00 13.0744631 cg01128042    13.0744631
## 133  0.2711506 1.248014e+01  1.1549034 cg00999469    12.4801390
## 134  0.0000000 1.079026e+01 12.3791517 cg06118351    12.3791517
## 135  0.0000000 1.123953e+01 11.7870153 cg12012426    11.7870153
## 136 11.7355779 9.453096e+00  0.0000000 cg08584917    11.7355779
## 137 11.6965026 0.000000e+00 11.1694390 cg27272246    11.6965026
## 138  0.0000000 1.168019e+01  2.2462032 cg15633912    11.6801939
## 139 11.3472005 1.977304e+00  0.0000000 cg17906851    11.3472005
## 140  1.1947138 1.133359e+01  0.0000000 cg16788319    11.3335935
## 141  8.9803747 0.000000e+00 11.2948800 cg07028768    11.2948800
## 142  0.0000000 3.117611e+00 10.7425453 cg27086157    10.7425453
## 143  1.8005341 9.613392e+00  0.0000000 cg14240646     9.6133916
## 144  0.0000000 9.464076e+00  9.1968206 cg00154902     9.4640757
## 145  6.6622696 0.000000e+00  9.1080687 cg14307563     9.1080687
## 146  0.0000000 8.519950e+00  0.0000000 cg02320265     8.5199503
## 147  8.2042959 0.000000e+00  7.0539563 cg08779649     8.2042959
## 148  7.6562298 0.000000e+00  7.9627233 cg04664583     7.9627233
## 149  0.0000000 0.000000e+00  6.6052912 cg12466610     6.6052912
## 150  6.2519012 3.701166e+00  0.0000000 cg27639199     6.2519012
## 151  0.0000000 0.000000e+00  5.8434575 cg15501526     5.8434575
## 152  0.0000000 4.839811e+00  3.6619673 cg00689685     4.8398115
## 153  2.7986803 0.000000e+00  0.0787605 cg01413796     2.7986803
## 154  0.0000000 0.000000e+00  2.1277610 cg11247378     2.1277610
## 155  0.5214097 0.000000e+00  0.6361066    age.now     0.6361066
if (!require(reshape2)) {
  install.packages("reshape2")
  library(reshape2)
} else {
  library(reshape2)
}

if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_LRM1_df <- importance_model_LRM1_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.

if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_model_LRM1_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_model_LRM1_df,n=20)$Feature)
  importance_melted_LRM1_df <- importance_model_LRM1_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
##           CN     Dementia       MCI    Feature MaxImportance
## 1  90.423666 1.000000e+02  0.000000        PC1     100.00000
## 2  46.615823 7.876514e+01  0.000000        PC2      78.76514
## 3   6.073238 0.000000e+00 68.217329        PC3      68.21733
## 4  63.056789 1.183052e+01 36.936489 cg00962106      63.05679
## 5  23.026583 1.262802e+01 51.150669 cg02225060      51.15067
## 6  49.620983 8.390836e+00 25.397790 cg14710850      49.62098
## 7  49.049684 1.785809e+01 11.825811 cg27452255      49.04968
## 8  26.232315 5.625925e+00 49.012531 cg02981548      49.01253
## 9  48.680685 0.000000e+00 42.742151 cg08861434      48.68068
## 10 25.905599 4.811555e+01  5.790658 cg19503462      48.11555
## 11 27.972603 4.672801e+01  1.360295 cg07152869      46.72801
## 12 11.546987 1.796701e+01 45.944752 cg16749614      45.94475
## 13  1.412552 4.491749e+01 28.934219 cg05096415      44.91749
## 14 44.232818 3.508617e+00 25.256135 cg23432430      44.23282
## 15  3.087587 4.199779e+01 26.683094 cg17186592      41.99779
## 16 15.874509 4.166997e+01 10.435990 cg00247094      41.66997
## 17 41.421123 6.534278e+00 18.531927 cg09584650      41.42112
## 18 24.203482 1.687251e-03 40.480062 cg11133939      40.48006
## 19 39.187998 7.688370e+00 17.048767 cg16715186      39.18800
## 20 12.445938 3.860234e+01  8.424863 cg03129555      38.60234
## [1] "the top 20 features based on max way:"
##  [1] "PC1"        "PC2"        "PC3"        "cg00962106" "cg02225060" "cg14710850" "cg27452255"
##  [8] "cg02981548" "cg08861434" "cg19503462" "cg07152869" "cg16749614" "cg05096415" "cg23432430"
## [15] "cg17186592" "cg00247094" "cg09584650" "cg11133939" "cg16715186" "cg03129555"
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.

9.1.2.2 Model Diagnose & Improve

9.1.2.2.1 Class imbalance
Class imbalance Check
  • Let’s plot the distribution of “DX” using a bar plot.
table(df_LRM1$DX)
## 
##       CN Dementia      MCI 
##      221       94      333
prop.table(table(df_LRM1$DX))
## 
##        CN  Dementia       MCI 
## 0.3410494 0.1450617 0.5138889
table(trainData$DX)
## 
##       CN Dementia      MCI 
##      155       66      234
prop.table(table(trainData$DX))
## 
##        CN  Dementia       MCI 
## 0.3406593 0.1450549 0.5142857
barplot(table(df_LRM1$DX), main = "Whole Data Class Distribution")

For the training Data set:

barplot(table(trainData$DX), main = "Train Data Class Distribution")

  • Let’s calculate the imbalance ratio, which is the ratio of the number of samples in the majority class to the number of samples in the minority class. severe class imbalance will be indicated by high ratio.

    class_counts <- table(df_LRM1$DX)
    imbalance_ratio <- max(class_counts) / min(class_counts)
    print("The imbalance radio of the whole data set is:")
    ## [1] "The imbalance radio of the whole data set is:"
    print(imbalance_ratio)
    ## [1] 3.542553
    class_counts <- table(trainData$DX)
    imbalance_ratio <- max(class_counts) / min(class_counts)
    print("The imbalance radio of the training data set is:")
    ## [1] "The imbalance radio of the training data set is:"
    print(imbalance_ratio)
    ## [1] 3.545455
  • Let’s do Chi-square test which could determine if the class distribution significantly deviates from a balanced distribution. The p-value provided by the test will indicate the significance of class imbalance.

    chisq.test(table(df_LRM1$DX))
    ## 
    ##  Chi-squared test for given probabilities
    ## 
    ## data:  table(df_LRM1$DX)
    ## X-squared = 132.4, df = 2, p-value < 2.2e-16
    chisq.test(table(trainData$DX))
    ## 
    ##  Chi-squared test for given probabilities
    ## 
    ## data:  table(trainData$DX)
    ## X-squared = 93.156, df = 2, p-value < 2.2e-16
Solve Class imbalance use “SMOTE” (NOT OK YET, MAY NEED FURTHER IMPROVE)
library(smotefamily)

smote_data_LGR_1 <- SMOTE(X = trainData[, !names(trainData) %in% "DX"], target = trainData$DX, K = 5, dup_size = 1)

# Extract the new balanced dataset
balanced_data_LGR_1 <- smote_data_LGR_1$data
colnames(balanced_data_LGR_1)[colnames(balanced_data_LGR_1) == "class"] <- "DX"
table(balanced_data_LGR_1$DX)
## 
##       CN Dementia      MCI 
##      155      132      234
dim(balanced_data_LGR_1)
## [1] 521 156
Fit Model with Balanced Data
ctrl <- trainControl(method = "cv", number = 5)

model_LRM2 <- caret::train(DX ~ ., data = balanced_data_LGR_1, method = "glmnet", trControl = ctrl)

predictions <- predict(model_LRM2, newdata = testData)
caret::confusionMatrix(predictions, testData$DX)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia MCI
##   CN       45        6  15
##   Dementia  4       11   6
##   MCI      17       11  78
## 
## Overall Statistics
##                                           
##                Accuracy : 0.6943          
##                  95% CI : (0.6241, 0.7584)
##     No Information Rate : 0.513           
##     P-Value [Acc > NIR] : 2.356e-07       
##                                           
##                   Kappa : 0.4779          
##                                           
##  Mcnemar's Test P-Value : 0.5733          
## 
## Statistics by Class:
## 
##                      Class: CN Class: Dementia Class: MCI
## Sensitivity             0.6818         0.39286     0.7879
## Specificity             0.8346         0.93939     0.7021
## Pos Pred Value          0.6818         0.52381     0.7358
## Neg Pred Value          0.8346         0.90116     0.7586
## Prevalence              0.3420         0.14508     0.5130
## Detection Rate          0.2332         0.05699     0.4041
## Detection Prevalence    0.3420         0.10881     0.5492
## Balanced Accuracy       0.7582         0.66613     0.7450
print(model_LRM2)
## glmnet 
## 
## 521 samples
## 155 predictors
##   3 classes: 'CN', 'Dementia', 'MCI' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 416, 417, 417, 417, 417 
## Resampling results across tuning parameters:
## 
##   alpha  lambda       Accuracy   Kappa    
##   0.10   0.000186946  0.7083883  0.5523130
##   0.10   0.001869460  0.7121978  0.5563269
##   0.10   0.018694597  0.7180220  0.5649649
##   0.55   0.000186946  0.6987912  0.5369622
##   0.55   0.001869460  0.7102930  0.5525186
##   0.55   0.018694597  0.6872894  0.5142517
##   1.00   0.000186946  0.6834432  0.5136505
##   1.00   0.001869460  0.7026007  0.5416133
##   1.00   0.018694597  0.6468864  0.4489232
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.0186946.
train_predictions <- predict(model_LRM2, newdata = trainData, type = "raw")

train_accuracy <- mean(train_predictions == trainData$DX)


print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  0.958241758241758"
mean_accuracy_model_LRM2 <- mean(model_LRM2$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM2)
## [1] 0.6964347
importance_model_LRM2 <- varImp(model_LRM2)

print(importance_model_LRM2)
## glmnet variable importance
## 
##   variables are sorted by maximum importance across the classes
##   only 20 most important variables shown (out of 155)
## 
##                CN Dementia    MCI
## PC1        80.669  100.000  0.000
## PC2        38.820   80.718  0.000
## cg00962106 56.188    9.092 33.490
## PC3         7.545    0.000 55.850
## cg19503462 26.318   48.653  6.549
## cg27452255 47.894   21.175  8.084
## cg07152869 27.958   45.984  1.304
## cg05096415  3.341   45.589 28.316
## cg02225060 18.278   12.770 45.585
## cg14710850 45.321    8.650 21.700
## cg02981548 23.093    5.917 45.292
## cg08861434 44.860    0.000 36.593
## cg03129555 14.448   42.011 10.562
## cg23432430 41.985    6.884 20.286
## cg16749614  8.921   17.012 41.732
## cg17186592  3.593   40.123 25.160
## cg14924512  1.856   38.979 23.218
## cg09584650 38.236    7.583 15.073
## cg06864789 13.555   38.080 11.895
## cg03084184 19.825   37.842  3.062
plot(importance_model_LRM2, top = 20, main = "Variable Importance Plot")

importance_model_LRM2_df<-importance_model_LRM2$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6){
  
importance_final_model_LRM2 <- varImp(model_LRM2$finalModel)

library(dplyr)
ordered_importance_final_model_LRM2 <- importance_final_model_LRM2 %>% arrange(desc(Overall))

print(ordered_importance_final_model_LRM2)  
  
}
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_model_LRM2_df$Feature<-rownames(importance_model_LRM2_df)
  importance_model_LRM2_df <- importance_model_LRM2_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))
  print(importance_model_LRM2_df)
  
}
##               CN     Dementia          MCI    Feature MaxImportance
## 1   80.669259842 100.00000000  0.000000000        PC1   100.0000000
## 2   38.819826123  80.71816175  0.000000000        PC2    80.7181617
## 3   56.188431205   9.09164715 33.489593631 cg00962106    56.1884312
## 4    7.544943839   0.00000000 55.849961855        PC3    55.8499619
## 5   26.318196432  48.65339365  6.548593830 cg19503462    48.6533937
## 6   47.893542555  21.17486031  8.084363477 cg27452255    47.8935426
## 7   27.958187019  45.98417901  1.303750081 cg07152869    45.9841790
## 8    3.341262898  45.58888440 28.316497949 cg05096415    45.5888844
## 9   18.278052161  12.77036198 45.585042685 cg02225060    45.5850427
## 10  45.320598074   8.65004585 21.700387368 cg14710850    45.3205981
## 11  23.092639735   5.91677683 45.292211019 cg02981548    45.2922110
## 12  44.859679163   0.00000000 36.593453540 cg08861434    44.8596792
## 13  14.448058636  42.01074159 10.561922277 cg03129555    42.0107416
## 14  41.985454200   6.88416969 20.286475569 cg23432430    41.9854542
## 15   8.920752156  17.01208508 41.732120444 cg16749614    41.7321204
## 16   3.592550338  40.12263291 25.159849272 cg17186592    40.1226329
## 17   1.855731768  38.97859755 23.218315159 cg14924512    38.9785975
## 18  38.236309354   7.58283248 15.073046350 cg09584650    38.2363094
## 19  13.554946716  38.08024109 11.894853106 cg06864789    38.0802411
## 20  19.824747398  37.84166204  3.061516033 cg03084184    37.8416620
## 21  21.488967819   0.51831348 37.518320557 cg11133939    37.5183206
## 22  13.594568831  37.19225285  9.121541523 cg00247094    37.1922529
## 23   0.537660714  20.67180038 35.708582637 cg08857872    35.7085826
## 24  35.474946045   7.95611218 14.040046921 cg16715186    35.4749460
## 25   4.940502697  35.05176837 17.442552868 cg24859648    35.0517684
## 26  14.088329742  34.55096035  5.434257605 cg12279734    34.5509604
## 27   1.732301196  34.09861514 18.438268058 cg25259265    34.0986151
## 28   8.421708924  34.05470276 11.639240231 cg06378561    34.0547028
## 29   2.324010869  13.34764031 31.973137867 cg26219488    31.9731379
## 30  12.467038988  31.58764570  5.780707189 cg20913114    31.5876457
## 31   5.484732273  11.24243460 31.368588674 cg16652920    31.3685887
## 32   1.408899970  30.96741176 17.374727709 cg05841700    30.9674118
## 33  29.670147976  14.07125259  0.805720137 cg26948066    29.6701480
## 34  28.731758811  12.27587513  0.036521290 cg03982462    28.7317588
## 35  28.243427299   8.07844067  6.650568053 cg11227702    28.2434273
## 36   6.457454462  28.03894248  8.127382196 cg09854620    28.0389425
## 37  27.473232929   0.00000000 21.551407733 cg06536614    27.4732329
## 38   7.543918226   9.69276964 27.091698371 cg02621446    27.0916984
## 39   0.000000000  26.98695797 24.136371725 cg02494911    26.9869580
## 40  20.449791085   0.00000000 26.626226562 cg12146221    26.6262266
## 41   0.000000000  25.79037246 26.599235660 cg00616572    26.5992357
## 42   9.535885104  26.42260873  5.646629735 cg10750306    26.4226087
## 43  26.161114130   7.86787376  6.044388665 cg15535896    26.1611141
## 44   1.140414878  25.92724802 13.643048905 cg01667144    25.9272480
## 45   0.000000000  25.63479737 13.465613434 cg24861747    25.6347974
## 46  25.544180258  15.09335718  0.000000000 cg10240127    25.5441803
## 47  24.123077324   0.00000000 25.104931027 cg02372404    25.1049310
## 48   1.108301582   8.18252039 25.042404517 cg06715136    25.0424045
## 49  24.823638654   0.00000000 16.133980825 cg20685672    24.8236387
## 50   0.000000000  24.76847756 14.644120460 cg05570109    24.7684776
## 51  24.742424464   0.00000000 13.437058583 cg04248279    24.7424245
## 52   4.019375709   5.52672503 24.335785443 cg20678988    24.3357854
## 53   0.000000000  24.19229754 18.406568128 cg12534577    24.1922975
## 54   0.000000000  24.13290147 15.849791626 cg16579946    24.1329015
## 55   4.819147175  24.10976268  5.710025426 cg12738248    24.1097627
## 56   6.534246461   5.92804640 24.066359926 cg16771215    24.0663599
## 57  24.001214719  10.16137182  0.028438699 cg13080267    24.0012147
## 58   5.506260122   5.66762058 23.059746909 cg17738613    23.0597469
## 59  22.316420364   6.53244240  5.660162074 cg11331837    22.3164204
## 60   0.000000000  22.27724618 17.226369352 cg01680303    22.2772462
## 61  22.209400690   0.00000000 13.206406219 cg04412904    22.2094007
## 62   0.000000000  22.07433613 14.947652373 cg18821122    22.0743361
## 63   3.420914645   7.32110136 22.052709331 cg12682323    22.0527093
## 64  22.037157524  16.26770399  0.000000000 cg02356645    22.0371575
## 65   0.000000000  20.82172312 22.015679299 cg24873924    22.0156793
## 66   0.000000000  15.83394476 22.004093274 cg10369879    22.0040933
## 67   6.478329203  21.71628843  0.933365662 cg01013522    21.7162884
## 68  16.474577405   0.00000000 21.596576172 cg12228670    21.5965762
## 69   7.510857942  21.11648791  0.000000000 cg07523188    21.1164879
## 70  21.107902395  18.07932642  0.000000000 cg15775217    21.1079024
## 71  20.979670490   0.00000000 16.905430808 cg03071582    20.9796705
## 72  20.954215725   0.00000000 12.112608571 cg05234269    20.9542157
## 73   0.000000000  20.89247737  7.906785371 cg20507276    20.8924774
## 74   0.000000000  19.10583911 20.819551819 cg27341708    20.8195518
## 75  13.168400188  20.44807173  0.000000000 cg25561557    20.4480717
## 76  20.438204928   8.86601949  0.351354534 cg03088219    20.4382049
## 77  20.385646876   0.00000000 19.552513738 cg01921484    20.3856469
## 78   4.715214588  20.18112982  4.193800260 cg26069044    20.1811298
## 79  20.106732907   0.00000000  7.568483801 cg06112204    20.1067329
## 80  20.068564990   0.00000000 10.296917569 cg25758034    20.0685650
## 81  20.064020309   0.22221328  9.404859744 cg17421046    20.0640203
## 82  19.729246407   0.00000000 12.793223406 cg11438323    19.7292464
## 83  19.720993502   0.00000000  9.891621910 cg17429539    19.7209935
## 84  19.520193965  14.87193029  0.000000000 cg00322003    19.5201940
## 85  19.320209563   4.15170201  4.747309123 cg11187460    19.3202096
## 86   2.514891780   5.41107283 18.965540286 cg25879395    18.9655403
## 87   4.051251003  18.83920170  0.228544679 cg26474732    18.8392017
## 88   2.894319944  18.78261721  2.425510843 cg23161429    18.7826172
## 89   1.682510941   4.78560234 18.689792363 cg20370184    18.6897924
## 90  18.638351920   0.02063057  6.337509719 cg25436480    18.6383519
## 91   0.009426087   7.64134319 18.621465414 cg13885788    18.6214654
## 92  11.441363910  18.24527959  0.000000000 cg23916408    18.2452796
## 93   0.000000000  16.67120442 18.160627205 cg14527649    18.1606272
## 94   5.007978338   1.01499164 18.053255407 cg10738648    18.0532554
## 95   0.000000000  17.96109645 12.787393058 cg23658987    17.9610965
## 96   5.985911703  17.93935983  1.285450739 cg18339359    17.9393598
## 97  10.254358605   0.00000000 17.833226289 cg07480176    17.8332263
## 98   2.976699166  17.78268135  4.061910218 cg12284872    17.7826814
## 99  16.803580593  17.77605719  0.000000000 cg26757229    17.7760572
## 100  8.049371238  17.47305028  0.000000000 cg24506579    17.4730503
## 101 17.444256746   8.51373947  0.000000000 cg02932958    17.4442567
## 102 13.323139031   0.00000000 17.349212778 cg00272795    17.3492128
## 103  0.000000000   7.44782166 17.192451883 cg12784167    17.1924519
## 104 16.760296629   0.00000000  6.637118816 cg03660162    16.7602966
## 105  0.000000000  16.02256145 16.446415451 cg16178271    16.4464155
## 106 16.358446021   0.00000000 11.973566568 cg27577781    16.3584460
## 107 16.135595474   0.00000000  8.267086014 cg07138269    16.1355955
## 108 15.967517751   2.87983309  2.060507011 cg05321907    15.9675178
## 109  0.763644301  15.69596760  2.146109991 cg22274273    15.6959676
## 110  0.465140829   3.15694966 15.545670122 cg15865722    15.5456701
## 111 13.421528890  15.52970945  0.000000000 cg21209485    15.5297095
## 112 15.459967702   0.63930260  3.690621649 cg20139683    15.4599677
## 113  0.806019496  15.27068730  2.249971940 cg15633912    15.2706873
## 114  1.777742277  15.20212112  0.498002092 cg00675157    15.2021211
## 115  0.000000000  15.00877535 13.718828048 cg21854924    15.0087753
## 116  0.000000000   8.30855718 14.973459912 cg14564293    14.9734599
## 117  1.414245340  14.67010021  1.622457833 cg01933473    14.6701002
## 118 14.371970002   0.00000000  2.335595291 cg06950937    14.3719700
## 119  7.032260151   0.00000000 14.261003577 cg14293999    14.2610036
## 120  0.000000000   7.60390702 14.096008410 cg01128042    14.0960084
## 121 13.961256044   0.00000000  2.032447525 cg12776173    13.9612560
## 122 13.948299380   0.00000000 13.909155305 cg03327352    13.9482994
## 123  8.335142999   0.00000000 13.922313003 cg24851651    13.9223130
## 124 13.710854192   0.00000000  7.312289015 cg00696044    13.7108542
## 125  8.530049377   0.00000000 13.699914296 cg19377607    13.6999143
## 126  0.000000000   2.79684895 13.614447171 cg01153376    13.6144472
## 127 13.575135706   3.87959208  0.000000000 cg19512141    13.5751357
## 128  0.000000000   6.30757114 13.531294057 cg18819889    13.5312941
## 129  8.872915122   0.00000000 13.122201001 cg27272246    13.1222010
## 130 12.212109825   0.00000000 12.993192871 cg08198851    12.9931929
## 131  0.000000000   9.81433817 12.660695198 cg06118351    12.6606952
## 132  4.067801497  12.39688646  0.000000000 cg10985055    12.3968865
## 133  0.923845356  11.76399533  0.005381082 cg16788319    11.7639953
## 134  1.051827355  11.74956908  0.000000000 cg14240646    11.7495691
## 135  0.791232287  11.56067436  0.390564952 cg00999469    11.5606744
## 136  0.000000000  11.34437197 10.931309813 cg12012426    11.3443720
## 137  0.000000000   2.67641565 10.890005132 cg01549082    10.8900051
## 138 10.738506551   0.00000000  9.162934157 cg21697769    10.7385066
## 139 10.648814810   0.00000000  7.606004675 cg07028768    10.6488148
## 140 10.321683281   3.96413046  0.000000000 cg17906851    10.3216833
## 141  0.000000000   8.37428680  9.799369121 cg27086157     9.7993691
## 142  0.306298996   9.75465887  0.000000000 cg06697310     9.7546589
## 143  9.742996011   9.22757841  0.000000000 cg08584917     9.7429960
## 144  0.599453522   9.52207363  0.000000000 cg02320265     9.5220736
## 145  2.496774081   0.00000000  9.504663621 cg04664583     9.5046636
## 146  4.882144523   0.00000000  8.715422642 cg14307563     8.7154226
## 147  6.234496873   0.00000000  8.452431120 cg08779649     8.4524311
## 148  0.000000000   6.07518883  7.339581693 cg00154902     7.3395817
## 149  0.000000000   0.00000000  6.392651374 cg12466610     6.3926514
## 150  6.359265415   4.09026644  0.000000000 cg27639199     6.3592654
## 151  0.000000000   5.86623615  4.807421306 cg00689685     5.8662361
## 152  0.000000000   2.92271158  5.199320216 cg15501526     5.1993202
## 153  2.835382514   0.00000000  0.000000000 cg01413796     2.8353825
## 154  0.421134550   0.00000000  0.566104791    age.now     0.5661048
## 155  0.000000000   0.43969964  0.043762166 cg11247378     0.4396996
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_LRM2_df <- importance_model_LRM2_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM2_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.

if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_model_LRM2_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_model_LRM2_df,n=20)$Feature)
  
  importance_melted_LRM2_df <- importance_model_LRM2_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM2_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
##           CN   Dementia       MCI    Feature MaxImportance
## 1  80.669260 100.000000  0.000000        PC1     100.00000
## 2  38.819826  80.718162  0.000000        PC2      80.71816
## 3  56.188431   9.091647 33.489594 cg00962106      56.18843
## 4   7.544944   0.000000 55.849962        PC3      55.84996
## 5  26.318196  48.653394  6.548594 cg19503462      48.65339
## 6  47.893543  21.174860  8.084363 cg27452255      47.89354
## 7  27.958187  45.984179  1.303750 cg07152869      45.98418
## 8   3.341263  45.588884 28.316498 cg05096415      45.58888
## 9  18.278052  12.770362 45.585043 cg02225060      45.58504
## 10 45.320598   8.650046 21.700387 cg14710850      45.32060
## 11 23.092640   5.916777 45.292211 cg02981548      45.29221
## 12 44.859679   0.000000 36.593454 cg08861434      44.85968
## 13 14.448059  42.010742 10.561922 cg03129555      42.01074
## 14 41.985454   6.884170 20.286476 cg23432430      41.98545
## 15  8.920752  17.012085 41.732120 cg16749614      41.73212
## 16  3.592550  40.122633 25.159849 cg17186592      40.12263
## 17  1.855732  38.978598 23.218315 cg14924512      38.97860
## 18 38.236309   7.582832 15.073046 cg09584650      38.23631
## 19 13.554947  38.080241 11.894853 cg06864789      38.08024
## 20 19.824747  37.841662  3.061516 cg03084184      37.84166
## [1] "the top 20 features based on max way:"
##  [1] "PC1"        "PC2"        "cg00962106" "PC3"        "cg19503462" "cg27452255" "cg07152869"
##  [8] "cg05096415" "cg02225060" "cg14710850" "cg02981548" "cg08861434" "cg03129555" "cg23432430"
## [15] "cg16749614" "cg17186592" "cg14924512" "cg09584650" "cg06864789" "cg03084184"
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.

if(METHOD_FEATURE_FLAG == 5){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX, prob_predictions[, "MCI"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX, prob_predictions[, "Dementia"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX, prob_predictions[, "CI"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.8505
## The AUC value for class CN is: 0.850513 
## 
## Class: Dementia 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.8357
## The AUC value for class Dementia is: 0.8357143 
## 
## Class: MCI 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.8188
## The AUC value for class MCI is: 0.8188266

if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
}
## The mean AUC value across all classes with one versus rest method is: 0.835018
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
}
## The mean AUC value across all classes with one versus rest method is: 0.835018

9.1.3. Elastic Net

9.1.3.1 Elastic Net Model Training

df_ENM1<-processed_data 
featureName_ENM1<-AfterProcess_FeatureName
library(caret)
set.seed(123)
trainIndex <- createDataPartition(df_ENM1$DX, p = 0.7, list = FALSE)
trainData_ENM1 <- df_ENM1[trainIndex, ]
testData_ENM1 <- df_ENM1[-trainIndex, ]
ctrl <- trainControl(method = "cv", number = 5)
param_grid <- expand.grid(alpha = 0:1, lambda = seq(0.001, 1, length = 20))
elastic_net_model1 <- caret::train(DX ~ ., data = trainData_ENM1, method = "glmnet",
                           trControl = ctrl, tuneGrid = param_grid)
print(elastic_net_model1)
## glmnet 
## 
## 455 samples
## 155 predictors
##   3 classes: 'CN', 'Dementia', 'MCI' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 364, 365, 363, 364, 364 
## Resampling results across tuning parameters:
## 
##   alpha  lambda      Accuracy   Kappa     
##   0      0.00100000  0.6593476  0.42673396
##   0      0.05357895  0.6725349  0.43439423
##   0      0.10615789  0.6747338  0.43094148
##   0      0.15873684  0.6725599  0.42391171
##   0      0.21131579  0.6725837  0.41818370
##   0      0.26389474  0.6770526  0.42406079
##   0      0.31647368  0.6769804  0.41856449
##   0      0.36905263  0.6726087  0.40853473
##   0      0.42163158  0.6638170  0.38542265
##   0      0.47421053  0.6660148  0.38902178
##   0      0.52678947  0.6594214  0.37628816
##   0      0.57936842  0.6550252  0.36510400
##   0      0.63194737  0.6528274  0.35927177
##   0      0.68452632  0.6418618  0.33471759
##   0      0.73710526  0.6352200  0.31832804
##   0      0.78968421  0.6307756  0.30720022
##   0      0.84226316  0.6263800  0.29777058
##   0      0.89484211  0.6220322  0.28739881
##   0      0.94742105  0.6220322  0.28739881
##   0      1.00000000  0.6220322  0.28682520
##   1      0.00100000  0.6240596  0.37352512
##   1      0.05357895  0.5187546  0.05457313
##   1      0.10615789  0.5142862  0.00000000
##   1      0.15873684  0.5142862  0.00000000
##   1      0.21131579  0.5142862  0.00000000
##   1      0.26389474  0.5142862  0.00000000
##   1      0.31647368  0.5142862  0.00000000
##   1      0.36905263  0.5142862  0.00000000
##   1      0.42163158  0.5142862  0.00000000
##   1      0.47421053  0.5142862  0.00000000
##   1      0.52678947  0.5142862  0.00000000
##   1      0.57936842  0.5142862  0.00000000
##   1      0.63194737  0.5142862  0.00000000
##   1      0.68452632  0.5142862  0.00000000
##   1      0.73710526  0.5142862  0.00000000
##   1      0.78968421  0.5142862  0.00000000
##   1      0.84226316  0.5142862  0.00000000
##   1      0.89484211  0.5142862  0.00000000
##   1      0.94742105  0.5142862  0.00000000
##   1      1.00000000  0.5142862  0.00000000
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0 and lambda = 0.2638947.
mean_accuracy_elastic_net_model1 <- mean(elastic_net_model1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
FeatEval_Mean_mean_accuracy_cv_ENM1<-mean_accuracy_elastic_net_model1
print(FeatEval_Mean_mean_accuracy_cv_ENM1)
## [1] 0.5868952
train_predictions <- predict(elastic_net_model1, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData_ENM1$DX)

FeatEval_Mean_ENM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  0.863736263736264"
print(FeatEval_Mean_ENM1_trainAccuracy)
## [1] 0.8637363
predictions <- predict(elastic_net_model1, newdata = testData_ENM1)
cm_FeatEval_Mean_ENM1 <- caret::confusionMatrix(predictions,testData_ENM1$DX)
print(cm_FeatEval_Mean_ENM1)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia MCI
##   CN       45        5  13
##   Dementia  0        8   0
##   MCI      21       15  86
## 
## Overall Statistics
##                                           
##                Accuracy : 0.7202          
##                  95% CI : (0.6512, 0.7823)
##     No Information Rate : 0.513           
##     P-Value [Acc > NIR] : 3.473e-09       
##                                           
##                   Kappa : 0.4987          
##                                           
##  Mcnemar's Test P-Value : 6.901e-05       
## 
## Statistics by Class:
## 
##                      Class: CN Class: Dementia Class: MCI
## Sensitivity             0.6818         0.28571     0.8687
## Specificity             0.8583         1.00000     0.6170
## Pos Pred Value          0.7143         1.00000     0.7049
## Neg Pred Value          0.8385         0.89189     0.8169
## Prevalence              0.3420         0.14508     0.5130
## Detection Rate          0.2332         0.04145     0.4456
## Detection Prevalence    0.3264         0.04145     0.6321
## Balanced Accuracy       0.7700         0.64286     0.7429
cm_FeatEval_Mean_ENM1_Accuracy<-cm_FeatEval_Mean_ENM1$overall["Accuracy"]
cm_FeatEval_Mean_ENM1_Kappa<-cm_FeatEval_Mean_ENM1$overall["Kappa"]
print(cm_FeatEval_Mean_ENM1_Accuracy)
##  Accuracy 
## 0.7202073
print(cm_FeatEval_Mean_ENM1_Kappa)
##     Kappa 
## 0.4986772
importance_elastic_net_model1<- varImp(elastic_net_model1)


print(importance_elastic_net_model1)
## glmnet variable importance
## 
##   variables are sorted by maximum importance across the classes
##   only 20 most important variables shown (out of 155)
## 
##               CN Dementia    MCI
## PC1        86.62  100.000 13.315
## PC2        68.42   88.610 20.131
## cg00962106 72.97   12.360 60.542
## cg02225060 43.14   18.830 62.030
## cg02981548 49.97    8.976 59.006
## cg23432430 57.29   15.758 41.468
## cg14710850 54.50    8.365 46.076
## cg16749614 20.68   33.681 54.421
## cg07152869 48.29   54.287  5.938
## cg08857872 29.00   24.418 53.479
## cg16652920 27.04   25.381 52.480
## cg26948066 51.16   42.094  9.007
## PC3        12.12   38.675 50.853
## cg08861434 48.61    1.032 49.702
## cg27452255 49.50   29.752 19.681
## cg09584650 48.12   20.551 27.501
## cg11133939 31.91   15.806 47.783
## cg19503462 47.24   44.918  2.255
## cg06864789 20.57   46.483 25.849
## cg02372404 30.74   14.687 45.489
plot(importance_elastic_net_model1, top = 20, main = "Variable Importance Plot")

importance_elastic_net_model1_df<-importance_elastic_net_model1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 ||METHOD_FEATURE_FLAG==6 ){
importance_elastic_net_final_model1 <- varImp(elastic_net_model1$finalModel)

library(dplyr)
Ordered_importance_elastic_net_final_model1 <- importance_elastic_net_final_model1 %>% arrange(desc(Overall))

print(Ordered_importance_elastic_net_final_model1) 
  
}
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_elastic_net_model1_df$Feature<-rownames(importance_elastic_net_model1_df)
  importance_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_elastic_net_model1_df)
  
}
##              CN     Dementia        MCI    Feature MaxImportance
## 1   86.62120555 1.000000e+02 13.3154938        PC1   100.0000000
## 2   68.41506593 8.860969e+01 20.1313229        PC2    88.6096894
## 3   72.96616233 1.236037e+01 60.5424892 cg00962106    72.9661623
## 4   43.13680835 1.882953e+01 62.0296384 cg02225060    62.0296384
## 5   49.96682179 8.975883e+00 59.0060055 cg02981548    59.0060055
## 6   57.28920981 1.575794e+01 41.4679663 cg23432430    57.2892098
## 7   54.50457856 8.365086e+00 46.0761921 cg14710850    54.5045786
## 8   20.67732060 3.368079e+01 54.4214110 cg16749614    54.4214110
## 9   48.28564292 5.428696e+01  5.9380180 cg07152869    54.2869615
## 10  28.99802672 2.441777e+01 53.4791022 cg08857872    53.4791022
## 11  27.03540858 2.538148e+01 52.4801847 cg16652920    52.4801847
## 12  51.16479077 4.209436e+01  9.0071343 cg26948066    51.1647908
## 13  12.11541799 3.867459e+01 50.8533134        PC3    50.8533134
## 14  48.60716543 1.031662e+00 49.7021276 cg08861434    49.7021276
## 15  49.49661951 2.975203e+01 19.6812848 cg27452255    49.4966195
## 16  48.11515416 2.055094e+01 27.5009105 cg09584650    48.1151542
## 17  31.91327900 1.580637e+01 47.7829449 cg11133939    47.7829449
## 18  47.23672408 4.491797e+01  2.2554575 cg19503462    47.2367241
## 19  20.57040391 4.648297e+01 25.8492634 cg06864789    46.4829680
## 20  30.73922491 1.468689e+01 45.4894183 cg02372404    45.4894183
## 21  13.69661443 4.531850e+01 31.5585879 cg24859648    45.3185029
## 22  10.38190144 3.472253e+01 45.1677367 cg14527649    45.1677367
## 23  44.71353025 3.266273e+01 11.9874954 cg03982462    44.7135302
## 24  43.77701152 1.498667e+01 28.7270404 cg06536614    43.7770115
## 25   0.05742514 4.329817e+01 43.1774484 cg17186592    43.2981742
## 26  26.35394431 1.675139e+01 43.1686358 cg26219488    43.1686358
## 27  42.96370404 1.408099e+01 28.8194115 cg10240127    42.9637040
## 28  13.43767401 4.289926e+01 29.3982818 cg00247094    42.8992564
## 29  35.47297173 6.860107e+00 42.3963796 cg20685672    42.3963796
## 30   3.59530245 4.215625e+01 38.4976467 cg25259265    42.1562498
## 31  42.14113645 1.425879e+01 27.8190416 cg16715186    42.1411365
## 32   0.72769110 4.194567e+01 41.1546792 cg05096415    41.9456709
## 33  34.83831577 4.176667e+01  6.8650561 cg15775217    41.7666725
## 34  15.96690586 4.058821e+01 24.5580004 cg24861747    40.5882068
## 35  34.02934141 6.216275e+00 40.3089166 cg07028768    40.3089166
## 36   4.42781931 3.973144e+01 35.2403198 cg14924512    39.7314398
## 37  24.97951953 3.964246e+01 14.5996349 cg03084184    39.6424550
## 38   4.47173997 3.907243e+01 34.5373846 cg05570109    39.0724252
## 39  34.88057483 4.000672e+00 38.9445475 cg01921484    38.9445475
## 40   9.76127248 2.779390e+01 37.6184714 cg00154902    37.6184714
## 41  28.32748759 3.744433e+01  9.0535424 cg26757229    37.4443306
## 42  37.35561895 9.847796e+00 27.4445223 cg03660162    37.3556189
## 43  35.88442086 5.170695e-01 36.4647910 cg12228670    36.4647910
## 44   4.42310262 3.173869e+01 36.2250936 cg00616572    36.2250936
## 45  14.11894393 3.616405e+01 21.9818063 cg20507276    36.1640508
## 46   5.45777159 3.544527e+01 29.9241978 cg05841700    35.4452700
## 47  21.86701375 1.351361e+01 35.4439265 cg06715136    35.4439265
## 48  22.83529876 1.227374e+01 35.1723403 cg02621446    35.1723403
## 49  18.36208785 3.501828e+01 16.5928893 cg12738248    35.0182778
## 50  14.22696004 3.493687e+01 20.6466078 cg09854620    34.9368684
## 51  32.22385216 3.482028e+01  2.5331259 cg00322003    34.8202787
## 52   8.08624317 2.660459e+01 34.7541343 cg24873924    34.7541343
## 53  14.18196370 3.469870e+01 20.4534323 cg03129555    34.6986966
## 54  34.67728863 7.588431e+00 27.0255567 cg04412904    34.6772886
## 55  15.01426631 1.956865e+01 34.6462192 cg17738613    34.6462192
## 56  18.92211499 1.559144e+01 34.5768598 cg25879395    34.5768598
## 57  34.34194321 1.088790e+01 23.3907383 cg05234269    34.3419432
## 58  22.74955912 3.407311e+01 11.2602539 cg20913114    34.0731137
## 59   1.10574447 3.257061e+01 33.7396527 cg02494911    33.7396527
## 60  17.46672005 3.350959e+01 15.9795697 cg00675157    33.5095904
## 61  26.91097526 3.346748e+01  6.4932061 cg12279734    33.4674820
## 62  12.81125802 2.054875e+01 33.4233079 cg01153376    33.4233079
## 63  30.29546006 2.966941e+00 33.3257016 cg04248279    33.3257016
## 64  30.64271695 3.320881e+01  2.5027880 cg06697310    33.2088056
## 65  19.20109381 1.362764e+01 32.8920329 cg16771215    32.8920329
## 66  25.57285198 3.289020e+01  7.2540515 cg26474732    32.8902041
## 67   1.21338315 3.269540e+01 31.4187171 cg12534577    32.6954009
## 68  14.55218922 3.243695e+01 17.8214588 cg06378561    32.4369487
## 69  19.19031334 1.316187e+01 32.4154803 cg18819889    32.4154803
## 70  29.77580872 3.222224e+01  2.3831323 cg01013522    32.2222417
## 71   8.93820972 2.321177e+01 32.2132838 cg10369879    32.2132838
## 72  31.33699934 9.314901e+00 21.9587973 cg03327352    31.3369993
## 73  31.30078323 8.697086e+00 22.5403967 cg07138269    31.3007832
## 74  30.28086989 7.153738e-01 31.0595443 cg12146221    31.0595443
## 75  31.01515677 1.154188e+01 19.4099769 cg11227702    31.0151568
## 76  30.50997690 2.051326e-01 30.7784101 cg27577781    30.7784101
## 77  30.73604217 2.929706e+01  1.3756812 cg02356645    30.7360422
## 78  10.88804539 1.960641e+01 30.5577524 cg15865722    30.5577524
## 79  21.12659334 3.052710e+01  9.3372037 cg18339359    30.5270977
## 80  21.72379588 3.049938e+01  8.7122800 cg08584917    30.4993765
## 81  30.48187501 1.623371e+01 14.1848668 cg15535896    30.4818750
## 82   9.34486828 3.034689e+01 20.9387194 cg01680303    30.3468883
## 83   0.66118138 2.956653e+01 30.2910133 cg01667144    30.2910133
## 84  17.55766315 2.993390e+01 12.3129407 cg07523188    29.9339044
## 85  12.71980384 1.708478e+01 29.8678851 cg21854924    29.8678851
## 86   9.99015500 2.974249e+01 19.6890322 cg10750306    29.7424878
## 87   5.72469553 2.961587e+01 23.8278786 cg16579946    29.6158747
## 88  29.45305133 5.870075e+00 23.5196762 cg11438323    29.4530513
## 89   7.90125584 2.936465e+01 21.4000924 cg18821122    29.3646489
## 90  13.47339382 1.551441e+01 29.0511081 cg01128042    29.0511081
## 91  12.43918028 1.650836e+01 29.0108418 cg14564293    29.0108418
## 92  28.69944088 4.408577e-01 28.1952826 cg08198851    28.6994409
## 93  25.92061288 2.700534e+00 28.6844472 cg00696044    28.6844472
## 94  28.64639261 7.487005e+00 21.0960870 cg17421046    28.6463926
## 95  28.22189323 1.423101e+01 13.9275795 cg11331837    28.2218932
## 96   4.57947370 2.318215e+01 27.8249201 cg12682323    27.8249201
## 97  27.75324966 2.314613e+01  4.5438216 cg02932958    27.7532497
## 98   2.23125343 2.770568e+01 25.4111304 cg23658987    27.7056844
## 99  13.54232520 1.406008e+01 27.6657079 cg07480176    27.6657079
## 100 18.99349539 8.561292e+00 27.6180885 cg10738648    27.6180885
## 101 23.24340158 4.224666e+00 27.5313687 cg03071582    27.5313687
## 102 27.50590369 1.371648e+01 13.7261241 cg25758034    27.5059037
## 103  8.31694986 1.850464e+01 26.8848892 cg06118351    26.8848892
## 104 26.47439884 2.668353e+01  0.1458283 cg19512141    26.6835278
## 105 15.77673761 2.662697e+01 10.7869322 cg23161429    26.6269705
## 106 13.98076844 2.639473e+01 12.3506587 cg11247378    26.3947278
## 107 18.59031320 7.685003e+00 26.3386166 cg20678988    26.3386166
## 108 14.36946682 1.154490e+01 25.9776676 cg27086157    25.9776676
## 109 25.84471361 9.776568e+00 16.0048448 cg03088219    25.8447136
## 110 13.62887699 2.527551e+01 11.5833356 cg22274273    25.2755132
## 111  2.73157059 2.236077e+01 25.1556395 cg13885788    25.1556395
## 112  7.97199814 1.668181e+01 24.7171056 cg14240646    24.7171056
## 113 23.64743847 7.878552e-01 24.4985943 cg06112204    24.4985943
## 114 24.37541925 4.910134e+00 19.4019850 cg17429539    24.3754193
## 115 23.05439563 2.435219e+01  1.2344903 cg25561557    24.3521866
## 116 21.11737928 3.135116e+00 24.3157963 cg14293999    24.3157963
## 117 15.52438712 8.640581e+00 24.2282683 cg19377607    24.2282683
## 118 21.13723634 2.411155e+01  2.9110134 cg06950937    24.1115504
## 119 24.09543447 4.091862e+00 19.9402723 cg25436480    24.0954345
## 120 14.61419073 9.017717e+00 23.6952080 cg00272795    23.6952080
## 121 10.00780864 1.338601e+01 23.4571172 cg12012426    23.4571172
## 122 23.37986405 1.718248e+01  6.1340826 cg05321907    23.3798640
## 123 23.15469253 9.974091e+00 13.1173009 cg20139683    23.1546925
## 124  0.72251260 2.312762e+01 22.3418067 cg26069044    23.1276199
## 125 21.02399526 2.241551e+01  1.3282102 cg23916408    22.4155061
## 126  0.60421044 2.223064e+01 21.5631268 cg27341708    22.2306378
## 127 15.96983051 2.220762e+01  6.1744927 cg13080267    22.2076238
## 128 21.86254895 1.299155e+00 20.5000930 cg27272246    21.8625490
## 129  0.95607645 2.184173e+01 20.8223545 cg12284872    21.8417315
## 130  2.40807486 2.169957e+01 19.2281930 cg00689685    21.6995684
## 131  2.01248824 2.152691e+01 19.4511227 cg16178271    21.5269115
## 132 21.27794768 8.125429e+00 13.0892177 cg21209485    21.2779477
## 133 20.58897269 1.059029e+01  9.9353779 cg24851651    20.5889727
## 134 20.33635501 7.328012e+00 12.9450419 cg21697769    20.3363550
## 135 20.33048061 6.214862e+00 14.0523177 cg04664583    20.3304806
## 136 14.64078345 1.993511e+01  5.2310225 cg00999469    19.9351066
## 137  2.26784740 1.742905e+01 19.7602009 cg20370184    19.7602009
## 138 18.98159250 4.184462e+00 14.7338294 cg11187460    18.9815925
## 139 18.43567987 1.998139e+00 16.3742404 cg12784167    18.4356799
## 140  1.20208781 1.698257e+01 18.2479601 cg02320265    18.2479601
## 141 17.49161956 1.357721e+01  3.8511068 cg12776173    17.4916196
## 142 17.27715467 1.271737e+00 15.9421167 cg08779649    17.2771547
## 143  8.18293699 8.988656e+00 17.2348934 cg01933473    17.2348934
## 144 17.18534234 8.948602e+00  8.1734396 cg15501526    17.1853423
## 145 13.77337037 1.693396e+01  3.0972842 cg10985055    16.9339552
## 146 16.16407186 6.750172e+00  9.3505990 cg17906851    16.1640719
## 147 11.29937443 4.707856e+00 16.0705306 cg14307563    16.0705306
## 148  4.33271953 1.431148e+01  9.9154553 cg16788319    14.3114754
## 149 11.34917622 1.384179e+01  2.4293151 cg24506579    13.8417919
## 150  9.52161846 1.242055e+01  2.8356319 cg27639199    12.4205510
## 151  1.91338340 1.029514e+01 12.2718262 cg12466610    12.2718262
## 152  9.00378501 2.189396e+00 11.2564819 cg15633912    11.2564819
## 153  0.00000000 1.116843e+01 11.2317321 cg01413796    11.2317321
## 154  1.45869467 1.880830e-01  1.7100783 cg01549082     1.7100783
## 155  0.70747253 6.047179e-03  0.7768203    age.now     0.7768203
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_elastic_net_model1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.

if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_elastic_net_model1_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_elastic_net_model1_df,n=20)$Feature)
  
  importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_elastic_net_model1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
##          CN   Dementia       MCI    Feature MaxImportance
## 1  86.62121 100.000000 13.315494        PC1     100.00000
## 2  68.41507  88.609689 20.131323        PC2      88.60969
## 3  72.96616  12.360372 60.542489 cg00962106      72.96616
## 4  43.13681  18.829529 62.029638 cg02225060      62.02964
## 5  49.96682   8.975883 59.006005 cg02981548      59.00601
## 6  57.28921  15.757943 41.467966 cg23432430      57.28921
## 7  54.50458   8.365086 46.076192 cg14710850      54.50458
## 8  20.67732  33.680790 54.421411 cg16749614      54.42141
## 9  48.28564  54.286962  5.938018 cg07152869      54.28696
## 10 28.99803  24.417775 53.479102 cg08857872      53.47910
## 11 27.03541  25.381475 52.480185 cg16652920      52.48018
## 12 51.16479  42.094356  9.007134 cg26948066      51.16479
## 13 12.11542  38.674595 50.853313        PC3      50.85331
## 14 48.60717   1.031662 49.702128 cg08861434      49.70213
## 15 49.49662  29.752034 19.681285 cg27452255      49.49662
## 16 48.11515  20.550943 27.500911 cg09584650      48.11515
## 17 31.91328  15.806365 47.782945 cg11133939      47.78294
## 18 47.23672  44.917966  2.255458 cg19503462      47.23672
## 19 20.57040  46.482968 25.849263 cg06864789      46.48297
## 20 30.73922  14.686893 45.489418 cg02372404      45.48942
## [1] "the top 20 features based on max way:"
##  [1] "PC1"        "PC2"        "cg00962106" "cg02225060" "cg02981548" "cg23432430" "cg14710850"
##  [8] "cg16749614" "cg07152869" "cg08857872" "cg16652920" "cg26948066" "PC3"        "cg08861434"
## [15] "cg27452255" "cg09584650" "cg11133939" "cg19503462" "cg06864789" "cg02372404"
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.

if(METHOD_FEATURE_FLAG == 5){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_ENM1_AUC <- auc_value
  print(auc_value) 
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_ENM1_AUC <- auc_value
  print(auc_value) 
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 3){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_ENM1_AUC <- auc_value
  print(auc_value) 
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if (METHOD_FEATURE_FLAG ==1){
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.8682
## The AUC value for class CN is: 0.8681699 
## 
## Class: Dementia 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.8656
## The AUC value for class Dementia is: 0.8655844 
## 
## Class: MCI 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.8361
## The AUC value for class MCI is: 0.8361272

if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Mean_ENM1_AUC <- mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.8566272

9.1.4. XGBoost

9.1.4.1 XGBoost Model Training

library(caret)
library(xgboost)
library(dplyr)
library(doParallel)
numCores <- detectCores() - 1
c2 <- makeCluster(numCores)
registerDoParallel(c2)
df_XGB1<-processed_data 
featureName_XGB1<-AfterProcess_FeatureName
set.seed(123)
trainIndex <- createDataPartition(df_XGB1$DX, p = 0.7, list = FALSE)
trainData_XGB1<- df_XGB1[trainIndex, ]
testData_XGB1 <- df_XGB1[-trainIndex, ]
cv_control <- trainControl(method = "cv", number = 5, allowParallel = TRUE)

xgb_model <- caret::train(
  DX ~ ., data = trainData_XGB1,
  method = "xgbTree", trControl = cv_control,
  metric = "Accuracy"
)

print(xgb_model)
## eXtreme Gradient Boosting 
## 
## 455 samples
## 155 predictors
##   3 classes: 'CN', 'Dementia', 'MCI' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 364, 365, 363, 364, 364 
## Resampling results across tuning parameters:
## 
##   eta  max_depth  colsample_bytree  subsample  nrounds  Accuracy   Kappa     
##   0.3  1          0.6               0.50        50      0.5604677  0.18305636
##   0.3  1          0.6               0.50       100      0.5692090  0.22745054
##   0.3  1          0.6               0.50       150      0.5801996  0.24875370
##   0.3  1          0.6               0.75        50      0.5517710  0.15720609
##   0.3  1          0.6               0.75       100      0.5649350  0.20438528
##   0.3  1          0.6               0.75       150      0.5628343  0.20763607
##   0.3  1          0.6               1.00        50      0.5122095  0.08805782
##   0.3  1          0.6               1.00       100      0.5452498  0.16811878
##   0.3  1          0.6               1.00       150      0.5606583  0.20408170
##   0.3  1          0.8               0.50        50      0.5692090  0.20998376
##   0.3  1          0.8               0.50       100      0.5846430  0.25722266
##   0.3  1          0.8               0.50       150      0.5779524  0.24804711
##   0.3  1          0.8               0.75        50      0.5428115  0.15376293
##   0.3  1          0.8               0.75       100      0.5583394  0.19598168
##   0.3  1          0.8               0.75       150      0.5650289  0.21626454
##   0.3  1          0.8               1.00        50      0.5210251  0.10441192
##   0.3  1          0.8               1.00       100      0.5430270  0.15979995
##   0.3  1          0.8               1.00       150      0.5562133  0.19365639
##   0.3  2          0.6               0.50        50      0.5384865  0.16265986
##   0.3  2          0.6               0.50       100      0.5669863  0.22298076
##   0.3  2          0.6               0.50       150      0.5693534  0.23182757
##   0.3  2          0.6               0.75        50      0.5583644  0.18712040
##   0.3  2          0.6               0.75       100      0.5937012  0.26024271
##   0.3  2          0.6               0.75       150      0.5826140  0.24618593
##   0.3  2          0.6               1.00        50      0.5408053  0.15939812
##   0.3  2          0.6               1.00       100      0.5516733  0.18052472
##   0.3  2          0.6               1.00       150      0.5649090  0.21413874
##   0.3  2          0.8               0.50        50      0.5869390  0.23990826
##   0.3  2          0.8               0.50       100      0.5824930  0.23552234
##   0.3  2          0.8               0.50       150      0.5758996  0.22732182
##   0.3  2          0.8               0.75        50      0.5934820  0.25135060
##   0.3  2          0.8               0.75       100      0.5912836  0.25346437
##   0.3  2          0.8               0.75       150      0.5978776  0.26886330
##   0.3  2          0.8               1.00        50      0.5540160  0.17911553
##   0.3  2          0.8               1.00       100      0.5451043  0.16459816
##   0.3  2          0.8               1.00       150      0.5670351  0.21007127
##   0.3  3          0.6               0.50        50      0.5650783  0.21060229
##   0.3  3          0.6               0.50       100      0.5606822  0.20850911
##   0.3  3          0.6               0.50       150      0.5629044  0.21133011
##   0.3  3          0.6               0.75        50      0.5780485  0.22322683
##   0.3  3          0.6               0.75       100      0.5956570  0.25481644
##   0.3  3          0.6               0.75       150      0.6022504  0.27090892
##   0.3  3          0.6               1.00        50      0.5781701  0.22288203
##   0.3  3          0.6               1.00       100      0.5869385  0.23776743
##   0.3  3          0.6               1.00       150      0.5803690  0.23301708
##   0.3  3          0.8               0.50        50      0.5450565  0.16059080
##   0.3  3          0.8               0.50       100      0.5650311  0.20879832
##   0.3  3          0.8               0.50       150      0.5737745  0.22394322
##   0.3  3          0.8               0.75        50      0.5516749  0.17172162
##   0.3  3          0.8               0.75       100      0.5715528  0.21250756
##   0.3  3          0.8               0.75       150      0.5692817  0.20717324
##   0.3  3          0.8               1.00        50      0.5605882  0.18804779
##   0.3  3          0.8               1.00       100      0.5606110  0.18873086
##   0.3  3          0.8               1.00       150      0.5649589  0.19776848
##   0.4  1          0.6               0.50        50      0.5318442  0.15010166
##   0.4  1          0.6               0.50       100      0.5670590  0.22666754
##   0.4  1          0.6               0.50       150      0.5670584  0.23727499
##   0.4  1          0.6               0.75        50      0.5385582  0.16666479
##   0.4  1          0.6               0.75       100      0.5758741  0.24000710
##   0.4  1          0.6               0.75       150      0.5649573  0.22879025
##   0.4  1          0.6               1.00        50      0.5409269  0.14921611
##   0.4  1          0.6               1.00       100      0.5585571  0.19773274
##   0.4  1          0.6               1.00       150      0.5694739  0.22587501
##   0.4  1          0.8               0.50        50      0.5430042  0.16638048
##   0.4  1          0.8               0.50       100      0.5584371  0.21323998
##   0.4  1          0.8               0.50       150      0.5738462  0.24123026
##   0.4  1          0.8               0.75        50      0.5606349  0.18669584
##   0.4  1          0.8               0.75       100      0.5496199  0.18521750
##   0.4  1          0.8               0.75       150      0.5803180  0.24933277
##   0.4  1          0.8               1.00        50      0.5343091  0.13804392
##   0.4  1          0.8               1.00       100      0.5474954  0.18343496
##   0.4  1          0.8               1.00       150      0.5605861  0.21083173
##   0.4  2          0.6               0.50        50      0.5295748  0.15350756
##   0.4  2          0.6               0.50       100      0.5604661  0.21432524
##   0.4  2          0.6               0.50       150      0.5583877  0.20905591
##   0.4  2          0.6               0.75        50      0.5714790  0.22465117
##   0.4  2          0.6               0.75       100      0.5539927  0.19184765
##   0.4  2          0.6               0.75       150      0.5605378  0.21420363
##   0.4  2          0.6               1.00        50      0.5671312  0.19986270
##   0.4  2          0.6               1.00       100      0.5626140  0.20310297
##   0.4  2          0.6               1.00       150      0.5825407  0.24632408
##   0.4  2          0.8               0.50        50      0.5495249  0.19461294
##   0.4  2          0.8               0.50       100      0.5650311  0.22003249
##   0.4  2          0.8               0.50       150      0.5760450  0.25188625
##   0.4  2          0.8               0.75        50      0.5497903  0.18106700
##   0.4  2          0.8               0.75       100      0.5847640  0.25087776
##   0.4  2          0.8               0.75       150      0.5694017  0.22664696
##   0.4  2          0.8               1.00        50      0.5692807  0.20731718
##   0.4  2          0.8               1.00       100      0.5826135  0.23323877
##   0.4  2          0.8               1.00       150      0.5912842  0.25931344
##   0.4  3          0.6               0.50        50      0.5690641  0.22158735
##   0.4  3          0.6               0.50       100      0.5670362  0.22452003
##   0.4  3          0.6               0.50       150      0.5582439  0.20883481
##   0.4  3          0.6               0.75        50      0.5670829  0.21530295
##   0.4  3          0.6               0.75       100      0.5846902  0.24552308
##   0.4  3          0.6               0.75       150      0.5780480  0.23934597
##   0.4  3          0.6               1.00        50      0.5691612  0.20525775
##   0.4  3          0.6               1.00       100      0.5801263  0.23012788
##   0.4  3          0.6               1.00       150      0.5800536  0.23200011
##   0.4  3          0.8               0.50        50      0.6022010  0.27466788
##   0.4  3          0.8               0.50       100      0.5978054  0.27332851
##   0.4  3          0.8               0.50       150      0.5911876  0.26561069
##   0.4  3          0.8               0.75        50      0.5824691  0.22001427
##   0.4  3          0.8               0.75       100      0.5891347  0.23859637
##   0.4  3          0.8               0.75       150      0.5890869  0.24245473
##   0.4  3          0.8               1.00        50      0.5604905  0.18793888
##   0.4  3          0.8               1.00       100      0.5627377  0.20109751
##   0.4  3          0.8               1.00       150      0.5715772  0.21562228
## 
## Tuning parameter 'gamma' was held constant at a value of 0
## Tuning parameter
##  'min_child_weight' was held constant at a value of 1
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were nrounds = 150, max_depth = 3, eta = 0.3, gamma =
##  0, colsample_bytree = 0.6, min_child_weight = 1 and subsample = 0.75.
mean_accuracy_xgb_model<- mean(xgb_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_xgb_model)
## [1] 0.5663579
FeatEval_Mean_mean_accuracy_cv_xgb<-mean_accuracy_xgb_model
print(FeatEval_Mean_mean_accuracy_cv_xgb)
## [1] 0.5663579
train_predictions <- predict(xgb_model, newdata = trainData_XGB1, type = "raw")

train_accuracy <- mean(train_predictions == trainData_XGB1$DX)
FeatEval_Mean_xgb_trainAccuracy <- train_accuracy

print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  1"
print(FeatEval_Mean_xgb_trainAccuracy)
## [1] 1
predictions <- predict(xgb_model, newdata = testData_XGB1)
cm_FeatEval_Mean_xgb <-caret::confusionMatrix(predictions,testData_XGB1$DX)
print(cm_FeatEval_Mean_xgb)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia MCI
##   CN       37        9  15
##   Dementia  0        4   2
##   MCI      29       15  82
## 
## Overall Statistics
##                                           
##                Accuracy : 0.6373          
##                  95% CI : (0.5652, 0.7051)
##     No Information Rate : 0.513           
##     P-Value [Acc > NIR] : 0.0003288       
##                                           
##                   Kappa : 0.3436          
##                                           
##  Mcnemar's Test P-Value : 3.34e-05        
## 
## Statistics by Class:
## 
##                      Class: CN Class: Dementia Class: MCI
## Sensitivity             0.5606         0.14286     0.8283
## Specificity             0.8110         0.98788     0.5319
## Pos Pred Value          0.6066         0.66667     0.6508
## Neg Pred Value          0.7803         0.87166     0.7463
## Prevalence              0.3420         0.14508     0.5130
## Detection Rate          0.1917         0.02073     0.4249
## Detection Prevalence    0.3161         0.03109     0.6528
## Balanced Accuracy       0.6858         0.56537     0.6801
cm_FeatEval_Mean_xgb_Accuracy <-cm_FeatEval_Mean_xgb$overall["Accuracy"]
cm_FeatEval_Mean_xgb_Kappa <-cm_FeatEval_Mean_xgb$overall["Kappa"]

print(cm_FeatEval_Mean_xgb_Accuracy)
##  Accuracy 
## 0.6373057
print(cm_FeatEval_Mean_xgb_Kappa)
##     Kappa 
## 0.3435693
importance_xgb_model<- varImp(xgb_model)

print(importance_xgb_model)
## xgbTree variable importance
## 
##   only 20 most important variables shown (out of 155)
## 
##            Overall
## age.now     100.00
## cg00962106   56.99
## cg09584650   53.51
## cg08857872   51.69
## cg14710850   48.76
## cg15501526   47.66
## cg02356645   47.36
## cg24861747   46.85
## cg03084184   46.62
## cg16771215   45.84
## cg02225060   45.82
## cg00154902   43.79
## cg03088219   43.41
## cg06864789   42.64
## cg02981548   42.60
## cg05234269   42.19
## cg17186592   41.63
## cg14293999   41.06
## cg01921484   40.81
## cg01013522   40.80
plot(importance_xgb_model, top = 20, main = "Variable Importance Plot")

importance_xgb_model_df<-importance_xgb_model$importance
importance <- xgb.importance(model = xgb_model$finalModel)
xgb.plot.importance(importance_matrix = importance)

ordered_importance <- importance[order(-importance$Importance), ]
print(ordered_importance)
##         Feature         Gain        Cover    Frequency   Importance
##          <char>        <num>        <num>        <num>        <num>
##   1:    age.now 0.0335992468 0.0357993760 0.0162321692 0.0335992468
##   2: cg00962106 0.0192050561 0.0286619797 0.0152484014 0.0192050561
##   3: cg09584650 0.0180386412 0.0195146141 0.0113133301 0.0180386412
##   4: cg08857872 0.0174293205 0.0207729971 0.0118052140 0.0174293205
##   5: cg14710850 0.0164488979 0.0146623552 0.0118052140 0.0164488979
##  ---                                                               
## 151: cg20370184 0.0009022372 0.0006686374 0.0024594196 0.0009022372
## 152: cg00272795 0.0008131386 0.0010294737 0.0019675357 0.0008131386
## 153: cg12466610 0.0007405692 0.0008348438 0.0024594196 0.0007405692
## 154: cg20678988 0.0004758319 0.0018982431 0.0054107231 0.0004758319
## 155: cg27272246 0.0001311978 0.0004099423 0.0009837678 0.0001311978
stopCluster(c2)
registerDoSEQ()
if(METHOD_FEATURE_FLAG == 5){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_xgb_AUC <-auc_value
  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_xgb_AUC <-auc_value
  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_xgb_AUC <-auc_value
  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.731
## The AUC value for class CN is: 0.7309711 
## 
## Class: Dementia 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.6333
## The AUC value for class Dementia is: 0.6333333 
## 
## Class: MCI 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.7237
## The AUC value for class MCI is: 0.7237266

if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Mean_xgb_AUC <- mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.6960104
print(FeatEval_Mean_xgb_AUC)
## [1] 0.6960104

9.1.5. Random Forest

9.1.5.1 Random Forest Model Training

library(caret)
library(randomForest)
df_RFM1<-processed_data 
featureName_RFM1<-AfterProcess_FeatureName
library(randomForest)

set.seed(123) 
trainIndex <- createDataPartition(df_RFM1$DX, p = 0.7, list = FALSE)
train_data_RFM1 <- df_RFM1[trainIndex, ]
test_data_RFM1 <- df_RFM1[-trainIndex, ]

X_train_RFM1 <- subset(train_data_RFM1, select = -DX)
y_train_RFM1 <- train_data_RFM1$DX
X_train_RFM1 <- subset(test_data_RFM1, select = -DX)
y_test_RFM1 <- test_data_RFM1$DX
ctrl <- trainControl(method = "cv", number = 5, classProbs = TRUE)

rf_model <- caret::train(
  DX ~ ., data = train_data_RFM1,
  method = "rf", trControl = ctrl,
  metric = "Accuracy",
  importance = TRUE
)


print(rf_model)
## Random Forest 
## 
## 455 samples
## 155 predictors
##   3 classes: 'CN', 'Dementia', 'MCI' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 364, 365, 363, 364, 364 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa     
##     2   0.5297930  0.03900252
##    78   0.5670845  0.15328626
##   155   0.5361687  0.09604486
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 78.
mean_accuracy_rf_model<- mean(rf_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_rf_model)
## [1] 0.5443487
FeatEval_Mean_mean_accuracy_cv_rf<-mean_accuracy_rf_model
print(FeatEval_Mean_mean_accuracy_cv_rf)
## [1] 0.5443487
train_predictions <- predict(rf_model, newdata = train_data_RFM1, type = "raw")


train_accuracy <- mean(train_predictions == train_data_RFM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  1"
FeatEval_Mean_rf_trainAccuracy<-train_accuracy
print(FeatEval_Mean_rf_trainAccuracy)
## [1] 1
predictions <- predict(rf_model, newdata = test_data_RFM1)
cm_FeatEval_Mean_rf<-caret::confusionMatrix(predictions,test_data_RFM1$DX)
print(cm_FeatEval_Mean_rf)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia MCI
##   CN       16        4   6
##   Dementia  0        0   0
##   MCI      50       24  93
## 
## Overall Statistics
##                                           
##                Accuracy : 0.5648          
##                  95% CI : (0.4917, 0.6358)
##     No Information Rate : 0.513           
##     P-Value [Acc > NIR] : 0.08547         
##                                           
##                   Kappa : 0.1467          
##                                           
##  Mcnemar's Test P-Value : 1.658e-13       
## 
## Statistics by Class:
## 
##                      Class: CN Class: Dementia Class: MCI
## Sensitivity             0.2424          0.0000     0.9394
## Specificity             0.9213          1.0000     0.2128
## Pos Pred Value          0.6154             NaN     0.5569
## Neg Pred Value          0.7006          0.8549     0.7692
## Prevalence              0.3420          0.1451     0.5130
## Detection Rate          0.0829          0.0000     0.4819
## Detection Prevalence    0.1347          0.0000     0.8653
## Balanced Accuracy       0.5818          0.5000     0.5761
cm_FeatEval_Mean_rf_Accuracy<-cm_FeatEval_Mean_rf$overall["Accuracy"]
print(cm_FeatEval_Mean_rf_Accuracy)
##  Accuracy 
## 0.5647668
cm_FeatEval_Mean_rf_Kappa<-cm_FeatEval_Mean_rf$overall["Kappa"]
print(cm_FeatEval_Mean_rf_Kappa)
##     Kappa 
## 0.1467368
importance_rf_model <- varImp(rf_model)


print(importance_rf_model)
## rf variable importance
## 
##   variables are sorted by maximum importance across the classes
##   only 20 most important variables shown (out of 155)
## 
##               CN Dementia    MCI
## cg15501526 69.03   18.177 100.00
## age.now    52.94   49.707  76.90
## cg01153376 36.87   52.235  64.43
## cg00962106 58.70    7.822  55.31
## cg06864789 18.69   56.208  36.56
## cg25259265 25.46   36.913  55.20
## cg12012426 25.46   23.629  52.64
## cg01013522 33.87   24.581  52.05
## cg08857872 22.43   51.916  49.54
## cg02494911 31.61   11.191  51.26
## cg04412904 50.30    2.344  25.61
## cg10985055 30.07   50.212  41.01
## cg11133939 49.41   28.237  34.88
## cg05234269 40.04   22.300  48.68
## cg02356645 18.63   48.341  27.81
## cg11438323 42.84   23.713  47.53
## cg06112204 30.93   47.297  38.54
## cg22274273 20.68   47.154  22.75
## cg16771215 23.06   17.890  47.07
## cg03088219 46.55   21.499  25.60
plot(importance_rf_model, top = 20, main = "Variable Importance Plot")

importance_rf_model_df<-importance_rf_model$importance
if( METHOD_FEATURE_FLAG==5 ){
  
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)

Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(MCI))


print(Ordered_importance_rf_final_model)
  
}
if( METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==6 ){
  
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)

Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(Dementia))


print(Ordered_importance_rf_final_model)
  
}
if(METHOD_FEATURE_FLAG==3 ){
  
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)

Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(CI))


print(Ordered_importance_rf_final_model)
  
}
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_rf_model_df$Feature<-rownames(importance_rf_model_df)
  importance_rf_model_df <- importance_rf_model_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))


  print(importance_rf_model_df)
  
}
##            CN  Dementia        MCI    Feature MaxImportance
## 1   69.025719 18.177097 100.000000 cg15501526     100.00000
## 2   52.940101 49.706685  76.896452    age.now      76.89645
## 3   36.869599 52.234879  64.426116 cg01153376      64.42612
## 4   58.696248  7.821543  55.305665 cg00962106      58.69625
## 5   18.694844 56.207569  36.562529 cg06864789      56.20757
## 6   25.460182 36.913238  55.204486 cg25259265      55.20449
## 7   25.459089 23.628894  52.636632 cg12012426      52.63663
## 8   33.874554 24.580893  52.045094 cg01013522      52.04509
## 9   22.433376 51.915676  49.544823 cg08857872      51.91568
## 10  31.611483 11.190557  51.263895 cg02494911      51.26390
## 11  50.304236  2.344009  25.606998 cg04412904      50.30424
## 12  30.066302 50.212412  41.005153 cg10985055      50.21241
## 13  49.414407 28.237257  34.882248 cg11133939      49.41441
## 14  40.044177 22.300343  48.676628 cg05234269      48.67663
## 15  18.634563 48.341100  27.811350 cg02356645      48.34110
## 16  42.837342 23.712832  47.533366 cg11438323      47.53337
## 17  30.926398 47.297242  38.542443 cg06112204      47.29724
## 18  20.675659 47.154365  22.752082 cg22274273      47.15437
## 19  23.058906 17.889971  47.071727 cg16771215      47.07173
## 20  46.546641 21.499184  25.598649 cg03088219      46.54664
## 21  46.299156 24.383707  43.194730 cg25879395      46.29916
## 22  46.221945 40.628164  38.058129 cg06118351      46.22194
## 23  37.823728 45.468390  29.489845 cg05096415      45.46839
## 24  19.977113 45.432659   3.047480 cg07152869      45.43266
## 25  45.261122 10.706496  25.065316 cg00999469      45.26112
## 26  31.579313 26.028728  45.140149 cg23432430      45.14015
## 27  44.676075 20.998192  26.896975 cg14710850      44.67607
## 28  44.616056 41.480593  36.142432 cg17186592      44.61606
## 29  28.717895 44.566513  21.750447 cg00154902      44.56651
## 30  30.513255 44.091763  13.097988 cg16788319      44.09176
## 31  36.199622 42.674675  44.052700 cg11331837      44.05270
## 32  43.858843 29.740998  28.556901 cg20685672      43.85884
## 33  26.437345 42.765073  43.784802 cg03129555      43.78480
## 34  37.500447 14.107280  43.770592        PC2      43.77059
## 35  37.274618 36.739745  43.632531 cg13080267      43.63253
## 36  43.458896 15.822039  34.335056 cg20507276      43.45890
## 37  12.515130 43.449644  24.860904 cg03084184      43.44964
## 38  30.059870 25.487357  43.314275 cg02621446      43.31427
## 39  31.089206 26.854276  43.115162 cg02320265      43.11516
## 40  43.012848 30.906569  19.322140 cg17429539      43.01285
## 41  19.006107 19.029049  42.874501 cg20370184      42.87450
## 42  42.852496 34.882704  24.284788 cg00616572      42.85250
## 43  28.015046 42.724643  31.100165 cg01667144      42.72464
## 44  36.746150 42.285753  23.632463 cg19503462      42.28575
## 45  42.161211 41.618990  42.233668 cg14293999      42.23367
## 46  24.649693 39.956340  42.211846 cg06950937      42.21185
## 47  25.893410 42.189364  27.351884 cg16178271      42.18936
## 48  26.846531 20.739224  42.099121 cg12228670      42.09912
## 49  29.509575 42.035406  28.211285 cg10750306      42.03541
## 50  13.450261 41.727508  26.841960 cg24861747      41.72751
## 51  24.617711  6.911054  41.721486 cg01128042      41.72149
## 52  17.720290 31.935312  41.377142 cg00689685      41.37714
## 53  41.295230 20.948864  22.647020 cg17738613      41.29523
## 54  41.056089 40.016054  31.962280 cg27086157      41.05609
## 55  40.724803 34.122964  39.066960 cg01921484      40.72480
## 56  40.134678 26.430712  13.687010 cg12776173      40.13468
## 57  39.436174 28.509216  40.003148 cg10240127      40.00315
## 58  37.591823 33.612391  39.422725 cg26069044      39.42273
## 59  39.373458 22.645050  22.498601 cg10738648      39.37346
## 60  39.234371 35.604887  10.944789 cg18821122      39.23437
## 61  36.174663 30.393198  38.806667 cg14564293      38.80667
## 62  16.667810 30.499501  38.758682 cg16652920      38.75868
## 63  38.560108 35.742883  28.200941 cg21697769      38.56011
## 64  31.778177 38.498127  22.643868 cg01413796      38.49813
## 65  26.901479 38.486454  21.635240 cg15775217      38.48645
## 66  28.690335 38.406036  31.086838 cg24851651      38.40604
## 67  23.820878 38.116407  20.925077 cg14924512      38.11641
## 68  34.688524 38.085264  26.599561 cg16749614      38.08526
## 69  19.981965 27.810180  37.923333 cg12682323      37.92333
## 70   4.149119 20.574224  37.916308 cg09854620      37.91631
## 71  37.460302  9.901765  31.761301 cg15633912      37.46030
## 72  20.010266 20.719787  37.368421 cg04248279      37.36842
## 73  33.287053 37.226645  36.190064 cg14240646      37.22665
## 74  26.703136 37.116289  33.311032 cg00247094      37.11629
## 75  25.377940 36.626672  22.425621 cg25561557      36.62667
## 76  20.333794 36.564382  32.065371 cg14527649      36.56438
## 77  36.356828 28.241690  25.952427 cg18339359      36.35683
## 78  22.137789 36.198781  23.834773 cg23161429      36.19878
## 79  35.863431 13.807607  20.116686 cg21854924      35.86343
## 80  19.987368 27.391614  35.673944 cg02981548      35.67394
## 81  20.193951 35.305710  19.474929 cg06378561      35.30571
## 82  34.165716 35.191144  24.325520 cg04664583      35.19114
## 83  22.292643 35.135818  30.696590 cg12279734      35.13582
## 84  35.064168 12.339826  21.036389 cg16715186      35.06417
## 85   9.272829 31.203990  34.764027 cg01549082      34.76403
## 86  21.119569 34.728878  22.615180 cg12738248      34.72888
## 87  22.408002 34.692897  21.659519 cg14307563      34.69290
## 88  23.042034 13.574008  34.490826 cg03071582      34.49083
## 89  20.282595 30.621684  34.359566 cg15865722      34.35957
## 90  21.629305 33.960839  19.427351 cg24859648      33.96084
## 91  33.943863 33.580898  30.422424 cg23658987      33.94386
## 92  17.718889 33.908121   9.715448 cg27341708      33.90812
## 93   7.274082 33.903893  32.682399 cg17421046      33.90389
## 94  17.545703 25.351852  33.872028 cg07028768      33.87203
## 95  16.260143 33.847630   8.138731 cg02372404      33.84763
## 96  27.551991 33.205867  16.910630 cg20913114      33.20587
## 97  24.252896 33.168730  29.175380 cg06697310      33.16873
## 98  30.824030 33.142282   7.200007        PC1      33.14228
## 99  31.115880 33.106914  22.203223 cg26757229      33.10691
## 100 19.038944 32.886627  11.810018 cg26948066      32.88663
## 101 32.808660 27.907435  31.911504 cg19377607      32.80866
## 102 32.548751 15.391325  29.901839 cg18819889      32.54875
## 103 17.314251 28.711213  32.536841 cg02225060      32.53684
## 104 16.911844 32.396125  23.929062 cg13885788      32.39613
## 105 19.009425 22.224641  32.359812 cg12466610      32.35981
## 106 26.901023 32.234200  20.492517 cg24873924      32.23420
## 107 13.267439 21.407984  31.940465 cg02932958      31.94046
## 108 14.144413 31.715035  11.737783 cg12534577      31.71504
## 109  6.516654 27.971840  31.646693 cg20678988      31.64669
## 110 10.424939 31.609860  26.239056 cg12146221      31.60986
## 111 10.011114 31.501308  28.412194 cg00675157      31.50131
## 112 31.287358 16.421843   9.298470 cg25758034      31.28736
## 113 19.895459 31.081147  18.341435 cg11247378      31.08115
## 114 30.890997 13.775058  15.847767 cg01680303      30.89100
## 115 30.802689 18.119419  17.160935 cg12784167      30.80269
## 116 21.258584 30.648745  22.223083 cg15535896      30.64875
## 117 30.151766 29.954159  23.131955 cg08198851      30.15177
## 118 24.048430 13.279270  30.009761 cg23916408      30.00976
## 119 22.069094 29.859201  13.147028 cg27577781      29.85920
## 120 23.349590 29.813060  25.382960 cg03327352      29.81306
## 121 29.462696 17.536988   1.464224 cg05321907      29.46270
## 122 29.179307 16.741326  28.448292 cg27452255      29.17931
## 123 29.160140 28.128132  22.982201 cg00322003      29.16014
## 124 29.159431 13.005837  26.484901        PC3      29.15943
## 125 22.903287 28.955661   0.000000 cg12284872      28.95566
## 126 28.701494 25.622556  13.570544 cg21209485      28.70149
## 127 25.291716 26.489500  28.663591 cg26219488      28.66359
## 128 24.105011 28.560061  23.113178 cg27272246      28.56006
## 129 23.245817 28.366537  26.140540 cg19512141      28.36654
## 130 28.204013 20.953862  24.564416 cg26474732      28.20401
## 131 27.445247 20.287962  27.885571 cg03982462      27.88557
## 132 27.807253 25.895347  12.980296 cg11227702      27.80725
## 133 22.227400 27.499933  27.664774 cg20139683      27.66477
## 134 27.617145 13.209045  10.869218 cg08779649      27.61714
## 135 27.478548 11.512734  19.930825 cg01933473      27.47855
## 136 27.413810 25.109085  13.860779 cg09584650      27.41381
## 137  6.648521 27.048875  23.921760 cg07523188      27.04887
## 138  5.627298 21.246600  26.708911 cg06536614      26.70891
## 139 22.843345 13.539292  26.206196 cg17906851      26.20620
## 140 24.771697 22.371546  25.860501 cg27639199      25.86050
## 141 15.595035 20.526671  25.827393 cg07480176      25.82739
## 142 13.652947 25.802567  24.144964 cg00272795      25.80257
## 143 23.164857 25.510611   7.807607 cg05841700      25.51061
## 144 24.694456 25.471377  25.085184 cg06715136      25.47138
## 145 18.083380 12.418059  25.460612 cg08584917      25.46061
## 146 24.893961 23.820587  24.211358 cg25436480      24.89396
## 147 16.585777 23.568968  18.357646 cg05570109      23.56897
## 148 11.873910 23.215783   1.839512 cg03660162      23.21578
## 149 12.070995 22.972798  15.202215 cg16579946      22.97280
## 150 19.252382 16.162452  22.169831 cg07138269      22.16983
## 151  5.336951 21.106379  17.424658 cg11187460      21.10638
## 152 20.968838  2.668884   4.100231 cg00696044      20.96884
## 153 20.696309 17.271723  20.839111 cg08861434      20.83911
## 154 18.659976 17.001843  15.574271 cg10369879      18.65998
## 155 18.201929  2.798671  16.020454 cg24506579      18.20193
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_rf_model_df <- importance_rf_model_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_rf_model_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.

if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_rf_model_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_rf_model_df,n=20)$Feature)
  
  importance_melted_rf_model_df <- importance_rf_model_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_rf_model_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
##          CN  Dementia       MCI    Feature MaxImportance
## 1  69.02572 18.177097 100.00000 cg15501526     100.00000
## 2  52.94010 49.706685  76.89645    age.now      76.89645
## 3  36.86960 52.234879  64.42612 cg01153376      64.42612
## 4  58.69625  7.821543  55.30567 cg00962106      58.69625
## 5  18.69484 56.207569  36.56253 cg06864789      56.20757
## 6  25.46018 36.913238  55.20449 cg25259265      55.20449
## 7  25.45909 23.628894  52.63663 cg12012426      52.63663
## 8  33.87455 24.580893  52.04509 cg01013522      52.04509
## 9  22.43338 51.915676  49.54482 cg08857872      51.91568
## 10 31.61148 11.190557  51.26390 cg02494911      51.26390
## 11 50.30424  2.344009  25.60700 cg04412904      50.30424
## 12 30.06630 50.212412  41.00515 cg10985055      50.21241
## 13 49.41441 28.237257  34.88225 cg11133939      49.41441
## 14 40.04418 22.300343  48.67663 cg05234269      48.67663
## 15 18.63456 48.341100  27.81135 cg02356645      48.34110
## 16 42.83734 23.712832  47.53337 cg11438323      47.53337
## 17 30.92640 47.297242  38.54244 cg06112204      47.29724
## 18 20.67566 47.154365  22.75208 cg22274273      47.15437
## 19 23.05891 17.889971  47.07173 cg16771215      47.07173
## 20 46.54664 21.499184  25.59865 cg03088219      46.54664
## [1] "the top 20 features based on max way:"
##  [1] "cg15501526" "age.now"    "cg01153376" "cg00962106" "cg06864789" "cg25259265" "cg12012426"
##  [8] "cg01013522" "cg08857872" "cg02494911" "cg04412904" "cg10985055" "cg11133939" "cg05234269"
## [15] "cg02356645" "cg11438323" "cg06112204" "cg22274273" "cg16771215" "cg03088219"
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.

if(METHOD_FEATURE_FLAG == 5){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")


  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_rf_AUC<-auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")


  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_rf_AUC<-auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 3){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")


  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "CI"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_rf_AUC<-auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.6964
## The AUC value for class CN is: 0.6963732 
## 
## Class: Dementia 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.6085
## The AUC value for class Dementia is: 0.6085498 
## 
## Class: MCI 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.6561
## The AUC value for class MCI is: 0.6560821

if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    
    FeatEval_Mean_rf_AUC <- mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.6536684
print(FeatEval_Mean_rf_AUC)
## [1] 0.6536684

9.1.6. SVM

9.1.6.1 SVM Model Training

df_SVM<-processed_data 
featureName_SVM1<-AfterProcess_FeatureName
trainIndex <- createDataPartition(df_SVM$DX, p = 0.7, list = FALSE)
train_data_SVM1 <- df_SVM[trainIndex, ]
test_data_SVM1 <- df_SVM[-trainIndex, ]

X_train_SVM1 <- subset(train_data_SVM1,select = -DX)
y_train_SVM1 <- train_data_SVM1$DX
X_test_SVM1 <- subset(test_data_SVM1, select= -DX )
y_test_SVM1 <- test_data_SVM1$DX
train_control <- trainControl(method = "cv", number = 5, classProbs = TRUE)

svm_model <- caret::train(DX ~ ., data = train_data_SVM1,
                   method = "svmRadial",
                   trControl = train_control)
print(svm_model)
## Support Vector Machines with Radial Basis Function Kernel 
## 
## 455 samples
## 155 predictors
##   3 classes: 'CN', 'Dementia', 'MCI' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 364, 363, 364, 364, 365 
## Resampling results across tuning parameters:
## 
##   C     Accuracy   Kappa    
##   0.25  0.7187907  0.5379918
##   0.50  0.7056033  0.5140213
##   1.00  0.7100228  0.5155945
## 
## Tuning parameter 'sigma' was held constant at a value of 0.003271835
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.003271835 and C = 0.25.
print(svm_model$bestTune)
##         sigma    C
## 1 0.003271835 0.25
mean_accuracy_svm_model<- mean(svm_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_svm_model)
## [1] 0.7114723
FeatEval_Mean_mean_accuracy_cv_svm<-mean_accuracy_svm_model
print(FeatEval_Mean_mean_accuracy_cv_svm)
## [1] 0.7114723
train_predictions <- predict(svm_model, newdata = train_data_SVM1)

train_accuracy <- mean(train_predictions == train_data_SVM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  0.951648351648352"
FeatEval_Mean_svm_trainAccuracy <- train_accuracy
print(FeatEval_Mean_svm_trainAccuracy)
## [1] 0.9516484
predictions <- predict(svm_model, newdata = test_data_SVM1)

cm_FeatEval_Mean_svm<-caret::confusionMatrix(predictions,test_data_SVM1$DX)
print(cm_FeatEval_Mean_svm)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia MCI
##   CN       44        7  17
##   Dementia  3       16  10
##   MCI      19        5  72
## 
## Overall Statistics
##                                           
##                Accuracy : 0.6839          
##                  95% CI : (0.6133, 0.7488)
##     No Information Rate : 0.513           
##     P-Value [Acc > NIR] : 1.077e-06       
##                                           
##                   Kappa : 0.4755          
##                                           
##  Mcnemar's Test P-Value : 0.337           
## 
## Statistics by Class:
## 
##                      Class: CN Class: Dementia Class: MCI
## Sensitivity             0.6667          0.5714     0.7273
## Specificity             0.8110          0.9212     0.7447
## Pos Pred Value          0.6471          0.5517     0.7500
## Neg Pred Value          0.8240          0.9268     0.7216
## Prevalence              0.3420          0.1451     0.5130
## Detection Rate          0.2280          0.0829     0.3731
## Detection Prevalence    0.3523          0.1503     0.4974
## Balanced Accuracy       0.7388          0.7463     0.7360
cm_FeatEval_Mean_svm_Accuracy <- cm_FeatEval_Mean_svm$overall["Accuracy"]
cm_FeatEval_Mean_svm_Kappa <- cm_FeatEval_Mean_svm$overall["Kappa"]
print(cm_FeatEval_Mean_svm_Accuracy)
##  Accuracy 
## 0.6839378
print(cm_FeatEval_Mean_svm_Kappa)
##     Kappa 
## 0.4754734

Let’s take a look of the feature importance of the model trained.

library(iml)

predictor_SVM <- Predictor$new(svm_model,data = df_SVM,y=df_SVM$DX)
importance_SVM <- FeatureImp$new(predictor_SVM,loss="ce")
print(importance_SVM)
## Interpretation method:  FeatureImp 
## error function: ce
## 
## Analysed predictor: 
## Prediction task: classification 
## Classes:  
## 
## Analysed data:
## Sampling from data.frame with 648 rows and 156 columns.
## 
## 
## Head of results:
##      feature importance.05 importance importance.95 permutation.error
## 1 cg23432430     1.0481928   1.072289      1.081928         0.1373457
## 2 cg24859648     1.0385542   1.072289      1.091566         0.1373457
## 3 cg15535896     1.0530120   1.072289      1.081928         0.1373457
## 4 cg26948066     1.0192771   1.060241      1.101205         0.1358025
## 5 cg25879395     0.9975904   1.060241      1.091566         0.1358025
## 6 cg14924512     1.0385542   1.060241      1.060241         0.1358025
plot(importance_SVM)

library(vip)

vip(svm_model, method = "permute", train = train_data_SVM1, target = "DX", nsim = 10, metric = "bal_accuracy", pred_wrapper = predict)

importance_SVM_df<-importance_SVM$results
if(METHOD_FEATURE_FLAG == 5){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")

  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc

  print(auc_value)
  FeatEval_Mean_svm_AUC <- auc_value
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")

  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc

  print(auc_value)
  FeatEval_Mean_svm_AUC <- auc_value
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 3){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")

  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc

  print(auc_value)
  FeatEval_Mean_svm_AUC <- auc_value
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
## Setting levels: control = 0, case = 1
## Setting direction: controls > cases
## Setting levels: control = 0, case = 1
## Setting direction: controls > cases
## Setting levels: control = 0, case = 1
## Setting direction: controls > cases
## Class: CN 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) > 66 cases (binary_labels 1).
## Area under the curve: 0.5173
## The AUC value for class CN is: 0.517299 
## 
## Class: Dementia 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) > 28 cases (binary_labels 1).
## Area under the curve: 0.5478
## The AUC value for class Dementia is: 0.5478355 
## 
## Class: MCI 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) > 99 cases (binary_labels 1).
## Area under the curve: 0.5609
## The AUC value for class MCI is: 0.5609284

if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Mean_svm_AUC <- mean_auc
    
}
## The mean AUC value across all classes with one versus rest method is: 0.542021
print(FeatEval_Mean_svm_AUC)
## [1] 0.542021

9.2 Selected Based on Median

9.2.1 Input Feature For Evaluation

Performance of the selected output features based on Median

processed_dataFrame<-df_selected_Median
processed_data<-output_median_feature

AfterProcess_FeatureName<-Selected_median_imp_Name
print(head(output_median_feature))
## # A tibble: 6 × 156
##   DX            PC1 cg00962106 cg16652920      PC3 cg27452255 cg08861434 cg06864789 cg08857872
##   <fct>       <dbl>      <dbl>      <dbl>    <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
## 1 MCI      -0.214        0.912      0.944 -0.0140       0.900      0.877     0.0537      0.340
## 2 CN       -0.173        0.538      0.943  0.00506      0.659      0.435     0.461       0.818
## 3 CN       -0.00367      0.504      0.946  0.0291       0.901      0.870     0.875       0.297
## 4 Dementia -0.187        0.904      0.942 -0.0323       0.890      0.471     0.490       0.295
## 5 MCI       0.0268       0.896      0.953  0.0529       0.578      0.862     0.479       0.894
## 6 CN       -0.0379       0.886      0.949 -0.00869      0.881      0.906     0.0542      0.890
## # ℹ 147 more variables: cg07152869 <dbl>, cg09584650 <dbl>, cg16749614 <dbl>, age.now <dbl>,
## #   cg05096415 <dbl>, cg23432430 <dbl>, cg01921484 <dbl>, cg02225060 <dbl>, cg02981548 <dbl>,
## #   cg14710850 <dbl>, cg19503462 <dbl>, PC2 <dbl>, cg17186592 <dbl>, cg00247094 <dbl>,
## #   cg11133939 <dbl>, cg25259265 <dbl>, cg16715186 <dbl>, cg05570109 <dbl>, cg26948066 <dbl>,
## #   cg02494911 <dbl>, cg14293999 <dbl>, cg14924512 <dbl>, cg02621446 <dbl>, cg03129555 <dbl>,
## #   cg04412904 <dbl>, cg26219488 <dbl>, cg00154902 <dbl>, cg20913114 <dbl>, cg03084184 <dbl>,
## #   cg12279734 <dbl>, cg01153376 <dbl>, cg16771215 <dbl>, cg04248279 <dbl>, cg06536614 <dbl>, …
print(Selected_median_imp_Name)
##   [1] "PC1"        "cg00962106" "cg16652920" "PC3"        "cg27452255" "cg08861434" "cg06864789"
##   [8] "cg08857872" "cg07152869" "cg09584650" "cg16749614" "age.now"    "cg05096415" "cg23432430"
##  [15] "cg01921484" "cg02225060" "cg02981548" "cg14710850" "cg19503462" "PC2"        "cg17186592"
##  [22] "cg00247094" "cg11133939" "cg25259265" "cg16715186" "cg05570109" "cg26948066" "cg02494911"
##  [29] "cg14293999" "cg14924512" "cg02621446" "cg03129555" "cg04412904" "cg26219488" "cg00154902"
##  [36] "cg20913114" "cg03084184" "cg12279734" "cg01153376" "cg16771215" "cg04248279" "cg06536614"
##  [43] "cg09854620" "cg06378561" "cg24859648" "cg10240127" "cg12228670" "cg03327352" "cg12146221"
##  [50] "cg03982462" "cg05841700" "cg15865722" "cg07523188" "cg11227702" "cg10369879" "cg16579946"
##  [57] "cg24861747" "cg14564293" "cg01128042" "cg00616572" "cg08198851" "cg17421046" "cg15535896"
##  [64] "cg18339359" "cg00322003" "cg02372404" "cg11331837" "cg23658987" "cg10738648" "cg25561557"
##  [71] "cg01667144" "cg05234269" "cg12534577" "cg06118351" "cg13885788" "cg10750306" "cg15775217"
##  [78] "cg01013522" "cg26474732" "cg27086157" "cg03088219" "cg15501526" "cg27577781" "cg11438323"
##  [85] "cg06715136" "cg17738613" "cg01680303" "cg06697310" "cg22274273" "cg12738248" "cg21854924"
##  [92] "cg14240646" "cg03071582" "cg24873924" "cg17429539" "cg06950937" "cg13080267" "cg27272246"
##  [99] "cg27341708" "cg18821122" "cg12682323" "cg12012426" "cg05321907" "cg20139683" "cg20685672"
## [106] "cg26757229" "cg25436480" "cg23916408" "cg20507276" "cg02356645" "cg07028768" "cg00272795"
## [113] "cg25758034" "cg16178271" "cg27639199" "cg11187460" "cg21209485" "cg14527649" "cg23161429"
## [120] "cg19512141" "cg02320265" "cg20370184" "cg12284872" "cg04664583" "cg11247378" "cg26069044"
## [127] "cg25879395" "cg00999469" "cg06112204" "cg02932958" "cg19377607" "cg12784167" "cg07480176"
## [134] "cg00696044" "cg18819889" "cg00689685" "cg00675157" "cg03660162" "cg10985055" "cg07138269"
## [141] "cg21697769" "cg08779649" "cg01933473" "cg17906851" "cg14307563" "cg12776173" "cg24851651"
## [148] "cg08584917" "cg16788319" "cg24506579" "cg01549082" "cg12466610" "cg15633912" "cg01413796"
## [155] "cg20678988"
print(head(df_selected_Median))
##                           DX          PC1 cg00962106 cg16652920          PC3 cg27452255
## 200223270003_R02C01      MCI -0.214185447  0.9124898  0.9436000 -0.014043316  0.9001010
## 200223270003_R03C01       CN -0.172761185  0.5375751  0.9431222  0.005055871  0.6593379
## 200223270003_R06C01       CN -0.003667305  0.5040948  0.9457161  0.029143653  0.9012217
## 200223270003_R07C01 Dementia -0.186779607  0.9039029  0.9419785 -0.032302430  0.8898635
## 200223270006_R01C01      MCI  0.026814649  0.8961556  0.9529417  0.052947950  0.5779792
## 200223270006_R04C01       CN -0.037862929  0.8857597  0.9492648 -0.008685676  0.8809143
##                     cg08861434 cg06864789 cg08857872 cg07152869 cg09584650 cg16749614  age.now
## 200223270003_R02C01  0.8768306 0.05369415  0.3395280  0.8284151 0.08230254  0.8678741 82.40000
## 200223270003_R03C01  0.4352647 0.46053125  0.8181845  0.5050630 0.09661586  0.8539348 78.60000
## 200223270003_R06C01  0.8698813 0.87513655  0.2970779  0.8352490 0.52399749  0.5874127 80.40000
## 200223270003_R07C01  0.4709249 0.49020327  0.2954090  0.5194300 0.11587211  0.5555391 78.16441
## 200223270006_R01C01  0.8618532 0.47852685  0.8935876  0.5025709 0.42115185  0.8026346 62.90000
## 200223270006_R04C01  0.9058965 0.05423587  0.8901338  0.8080916 0.56043178  0.7903978 80.67796
##                     cg05096415 cg23432430 cg01921484 cg02225060 cg02981548 cg14710850
## 200223270003_R02C01  0.9182527  0.9482702 0.90985496  0.6828159  0.1342571  0.8048592
## 200223270003_R03C01  0.5177819  0.9455418 0.90931369  0.8265195  0.5220037  0.8090950
## 200223270003_R06C01  0.6288426  0.9418716 0.92044873  0.5209552  0.5098965  0.8285902
## 200223270003_R07C01  0.6060271  0.9426559 0.91674311  0.8078889  0.5660985  0.8336457
## 200223270006_R01C01  0.5599588  0.9461736 0.02943747  0.6084903  0.5678714  0.8500725
## 200223270006_R04C01  0.5441200  0.9508404 0.89057041  0.7638781  0.5079859  0.8207247
##                     cg19503462           PC2 cg17186592 cg00247094 cg11133939 cg25259265
## 200223270003_R02C01  0.7951675  1.470293e-02  0.9230463  0.5399349  0.1282694  0.4356646
## 200223270003_R03C01  0.4537684  5.745834e-02  0.8593448  0.9315640  0.5920898  0.8893591
## 200223270003_R06C01  0.6997359  8.372861e-02  0.8467599  0.5177874  0.5127706  0.4201700
## 200223270003_R07C01  0.7189778 -1.117250e-02  0.4986373  0.5377765  0.8474176  0.4455517
## 200223270006_R01C01  0.7301755  1.650735e-05  0.8978999  0.9109309  0.8589133  0.8423337
## 200223270006_R04C01  0.4207207  1.571950e-02  0.9239750  0.5266535  0.5246557  0.8460736
##                     cg16715186 cg05570109 cg26948066 cg02494911 cg14293999 cg14924512
## 200223270003_R02C01  0.2742789  0.3466611  0.4685225  0.3049435  0.2836710  0.5303907
## 200223270003_R03C01  0.7946153  0.5866750  0.5026045  0.2416332  0.9172023  0.9160885
## 200223270003_R06C01  0.8124316  0.4046471  0.9101976  0.2520909  0.9168166  0.9088414
## 200223270003_R07C01  0.7773263  0.6014355  0.9379543  0.2457032  0.9188336  0.9081681
## 200223270006_R01C01  0.8334531  0.5774881  0.9120181  0.8045030  0.1971116  0.9111789
## 200223270006_R04C01  0.8039945  0.8756826  0.8868608  0.7489283  0.9030919  0.5331753
##                     cg02621446 cg03129555 cg04412904 cg26219488 cg00154902 cg20913114
## 200223270003_R02C01  0.8731313  0.6079616 0.05088595  0.9336638  0.5137741 0.36510482
## 200223270003_R03C01  0.8095534  0.5785498 0.07717659  0.9134707  0.8540746 0.80382984
## 200223270003_R06C01  0.7511582  0.9137818 0.08253743  0.9261878  0.8188126 0.03158439
## 200223270003_R07C01  0.8773609  0.9043041 0.06217431  0.9217866  0.4625776 0.81256840
## 200223270006_R01C01  0.2046541  0.9286357 0.11888769  0.4929692  0.4690086 0.81502059
## 200223270006_R04C01  0.7963817  0.9088564 0.08885846  0.9431574  0.4547219 0.90468830
##                     cg03084184 cg12279734 cg01153376 cg16771215 cg04248279 cg06536614
## 200223270003_R02C01  0.8162981  0.6435368  0.4872148 0.88389723  0.8534976  0.5824474
## 200223270003_R03C01  0.7877128  0.1494651  0.9639670 0.07196933  0.8458854  0.5746694
## 200223270003_R06C01  0.4546397  0.8760759  0.2242410 0.09949974  0.8332786  0.5773468
## 200223270003_R07C01  0.7812413  0.8674214  0.5155654 0.64234023  0.3303204  0.5848917
## 200223270006_R01C01  0.7818230  0.6454450  0.9588916 0.62679274  0.5966878  0.5669919
## 200223270006_R04C01  0.7725853  0.8660058  0.9586876 0.06970175  0.8939599  0.5718514
##                     cg09854620 cg06378561 cg24859648 cg10240127 cg12228670 cg03327352
## 200223270003_R02C01  0.5220587  0.9389306 0.83777536  0.9250553  0.8632174  0.8851712
## 200223270003_R03C01  0.8739646  0.9377503 0.44392797  0.9403255  0.8496212  0.8786878
## 200223270003_R06C01  0.8973149  0.5154019 0.03341185  0.9056974  0.8738949  0.3042310
## 200223270003_R07C01  0.8958863  0.9403569 0.43582347  0.9396217  0.8362189  0.8273211
## 200223270006_R01C01  0.9075331  0.4956816 0.03087161  0.9262370  0.8079694  0.8774082
## 200223270006_R04C01  0.9318820  0.9268832 0.02588024  0.9240497  0.6966666  0.8829492
##                     cg12146221 cg03982462 cg05841700 cg15865722 cg07523188 cg11227702
## 200223270003_R02C01  0.2049284  0.8562777  0.2923544 0.89438595  0.7509183 0.86486075
## 200223270003_R03C01  0.1814927  0.6023731  0.9146488 0.90194372  0.1524386 0.49184121
## 200223270003_R06C01  0.8619250  0.8778458  0.3737990 0.92118977  0.7127592 0.02543724
## 200223270003_R07C01  0.1238469  0.8860227  0.5046468 0.09230759  0.8464983 0.45150971
## 200223270006_R01C01  0.2021598  0.8703107  0.8419031 0.93422668  0.7847738 0.89086877
## 200223270006_R04C01  0.1383786  0.8792860  0.9286652 0.92220002  0.8231277 0.87675947
##                     cg10369879 cg16579946 cg24861747 cg14564293 cg01128042 cg00616572
## 200223270003_R02C01  0.9218784  0.6306315  0.3540897 0.52089591  0.9113420  0.9335067
## 200223270003_R03C01  0.3149306  0.6648766  0.4309505 0.04000662  0.5328806  0.9214079
## 200223270003_R06C01  0.9141081  0.6455081  0.8071462 0.04959460  0.5222757  0.9113633
## 200223270003_R07C01  0.9054415  0.8979650  0.3347317 0.03114773  0.5141721  0.9160238
## 200223270006_R01C01  0.2917862  0.6886498  0.3544795 0.51703196  0.9321215  0.4861334
## 200223270006_R04C01  0.9200403  0.6766907  0.5997840 0.51535010  0.5050081  0.9067928
##                     cg08198851 cg17421046 cg15535896 cg18339359 cg00322003 cg02372404
## 200223270003_R02C01  0.6578905  0.9026993  0.3382952  0.8824858  0.1759911 0.03598249
## 200223270003_R03C01  0.6578186  0.9112100  0.9253926  0.9040272  0.5702070 0.02767285
## 200223270003_R06C01  0.1272153  0.8952031  0.3320191  0.8552121  0.3077122 0.03127855
## 200223270003_R07C01  0.8351465  0.9268852  0.9409104  0.3073106  0.6104341 0.55685785
## 200223270006_R01C01  0.8791156  0.1118337  0.9326027  0.8973742  0.6147419 0.02587736
## 200223270006_R04C01  0.1423737  0.4174370  0.9156401  0.2292800  0.2293759 0.02828648
##                     cg11331837 cg23658987 cg10738648 cg25561557 cg01667144 cg05234269
## 200223270003_R02C01 0.03692842 0.79757644 0.44931577 0.76736369  0.8971484 0.93848584
## 200223270003_R03C01 0.57150125 0.07511718 0.49894016 0.03851635  0.3175389 0.57461229
## 200223270003_R06C01 0.03182862 0.10177571 0.05552024 0.47259480  0.9238364 0.02467208
## 200223270003_R07C01 0.03832164 0.46747992 0.03730440 0.43364249  0.8739442 0.56516794
## 200223270006_R01C01 0.93008298 0.76831297 0.54952781 0.46211439  0.2931961 0.94829529
## 200223270006_R04C01 0.54004452 0.08988532 0.59358167 0.44651530  0.8616530 0.56298286
##                     cg12534577 cg06118351 cg13885788 cg10750306 cg15775217 cg01013522
## 200223270003_R02C01  0.8585231 0.36339400  0.9380618 0.04919915  0.5707441  0.6251168
## 200223270003_R03C01  0.8493466 0.47148604  0.9369476 0.55160081  0.9168327  0.8862821
## 200223270003_R06C01  0.8395241 0.86559618  0.5163017 0.54694332  0.6042521  0.5425308
## 200223270003_R07C01  0.8511384 0.83494303  0.9183376 0.59824543  0.9062231  0.8429862
## 200223270006_R01C01  0.8804655 0.02632111  0.5525542 0.53158639  0.9083515  0.0480531
## 200223270006_R04C01  0.3029013 0.83329300  0.9328289 0.05646838  0.6383270  0.8240222
##                     cg26474732 cg27086157  cg03088219 cg15501526 cg27577781 cg11438323
## 200223270003_R02C01  0.7843252  0.9224112 0.844002862  0.6362531  0.8143535  0.4863471
## 200223270003_R03C01  0.8184088  0.9219304 0.007435243  0.6319253  0.8113185  0.8984559
## 200223270003_R06C01  0.7358417  0.3224986 0.120155222  0.7435100  0.8144274  0.8722772
## 200223270003_R07C01  0.7509296  0.3455486 0.826554308  0.7756577  0.7970617  0.5026756
## 200223270006_R01C01  0.8294938  0.8988962 0.066294915  0.3230777  0.8640044  0.8809646
## 200223270006_R04C01  0.8033167  0.9159217 0.574738383  0.8342695  0.8840237  0.8717937
##                     cg06715136 cg17738613 cg01680303 cg06697310 cg22274273 cg12738248
## 200223270003_R02C01  0.3400192  0.6879612  0.5095174  0.8454609  0.4209386 0.85430866
## 200223270003_R03C01  0.9259109  0.6582258  0.1344941  0.8653044  0.4246379 0.88010292
## 200223270003_R06C01  0.9079807  0.1022257  0.7573869  0.2405168  0.4196796 0.51121855
## 200223270003_R07C01  0.6782105  0.8960156  0.4772204  0.8479193  0.4164100 0.09131476
## 200223270006_R01C01  0.8369052  0.8850702  0.1176263  0.8206613  0.7951105 0.91529345
## 200223270006_R04C01  0.8807568  0.8481916  0.5133033  0.7839595  0.0229810 0.91911405
##                     cg21854924 cg14240646 cg03071582 cg24873924 cg17429539 cg06950937
## 200223270003_R02C01  0.8729132  0.5391334  0.9187811  0.3060635  0.7860900  0.8910968
## 200223270003_R03C01  0.7162342  0.2538363  0.5844421  0.8640985  0.7100923  0.2889345
## 200223270003_R06C01  0.7520990  0.1864902  0.6245558  0.8259149  0.7660838  0.9143801
## 200223270003_R07C01  0.8641284  0.6402007  0.9283683  0.8333940  0.6984969  0.8891079
## 200223270006_R01C01  0.6498895  0.7696079  0.5715416  0.8761177  0.6508597  0.8868617
## 200223270006_R04C01  0.5943113  0.1490028  0.6534650  0.8585363  0.2828452  0.9093273
##                     cg13080267 cg27272246 cg27341708 cg18821122 cg12682323 cg12012426
## 200223270003_R02C01 0.78936656  0.8615873 0.48846610  0.9291309  0.9397956  0.9165048
## 200223270003_R03C01 0.78371483  0.8705287 0.02613847  0.5901603  0.9003940  0.9434768
## 200223270003_R06C01 0.09436069  0.8103777 0.86893582  0.5779620  0.9157877  0.9220044
## 200223270003_R07C01 0.09351259  0.0310881 0.02642300  0.9251431  0.9048877  0.9241284
## 200223270006_R01C01 0.45173796  0.7686536 0.47573455  0.9217018  0.1065347  0.9327143
## 200223270006_R04C01 0.49866715  0.4403542 0.89411974  0.5412250  0.8836232  0.9271167
##                     cg05321907 cg20139683 cg20685672 cg26757229 cg25436480 cg23916408
## 200223270003_R02C01  0.2880477  0.8717075 0.67121006  0.6723726 0.84251599  0.1942275
## 200223270003_R03C01  0.1782629  0.9059433 0.79320906  0.1422661 0.49940321  0.9154993
## 200223270003_R06C01  0.8427929  0.8962554 0.66136456  0.7933794 0.34943119  0.8886255
## 200223270003_R07C01  0.8320504  0.9218012 0.80838304  0.8074830 0.85244913  0.8872447
## 200223270006_R01C01  0.2422218  0.1708472 0.08291414  0.5265692 0.44545117  0.2219945
## 200223270006_R04C01  0.2429551  0.1067122 0.84460055  0.7341953 0.02575036  0.1520624
##                     cg20507276 cg02356645 cg07028768 cg00272795 cg25758034 cg16178271
## 200223270003_R02C01 0.12238910  0.5105903  0.4496851 0.46365138  0.6114028  0.6445416
## 200223270003_R03C01 0.38721972  0.5833923  0.8536078 0.82839260  0.6649219  0.6178075
## 200223270003_R06C01 0.47978438  0.5701428  0.8356936 0.07231279  0.2393844  0.6641952
## 200223270003_R07C01 0.02261996  0.5683381  0.4245893 0.78303831  0.7071501  0.7148058
## 200223270006_R01C01 0.37465798  0.5233692  0.8835151 0.78219952  0.2301078  0.6138954
## 200223270006_R04C01 0.03570795  0.9188670  0.4514661 0.44408249  0.6891513  0.9414188
##                     cg27639199 cg11187460 cg21209485 cg14527649 cg23161429 cg19512141
## 200223270003_R02C01 0.67515415 0.03672179  0.8865053  0.2678912  0.8956965  0.8209161
## 200223270003_R03C01 0.67552763 0.92516409  0.8714878  0.7954683  0.9099619  0.7903543
## 200223270003_R06C01 0.06233093 0.03109553  0.2292550  0.8350610  0.8833895  0.8404684
## 200223270003_R07C01 0.05701332 0.53283119  0.2351526  0.8428684  0.9134709  0.2202759
## 200223270006_R01C01 0.05037694 0.54038146  0.8882046  0.8231348  0.8738558  0.8059589
## 200223270006_R04C01 0.08144161 0.91096169  0.2292483  0.8022444  0.9104210  0.7020247
##                     cg02320265 cg20370184 cg12284872 cg04664583 cg11247378 cg26069044
## 200223270003_R02C01  0.8853213 0.37710950  0.8008333  0.5572814  0.1591185 0.92401867
## 200223270003_R03C01  0.4686314 0.05737964  0.7414569  0.5881190  0.7874849 0.94072227
## 200223270003_R06C01  0.4838749 0.04740505  0.7725267  0.9352717  0.4807942 0.93321315
## 200223270003_R07C01  0.8986848 0.83572095  0.7573369  0.9350230  0.4537348 0.56567694
## 200223270006_R01C01  0.8987560 0.04056608  0.7201607  0.9424588  0.1537079 0.94369927
## 200223270006_R04C01  0.4768520 0.04038589  0.8021446  0.9379537  0.1686356 0.02040391
##                     cg25879395 cg00999469 cg06112204 cg02932958 cg19377607 cg12784167
## 200223270003_R02C01 0.88130864  0.3274080  0.5251592  0.7901008 0.05377464 0.81503498
## 200223270003_R03C01 0.02603438  0.2857719  0.8773488  0.4210489 0.90570746 0.02811410
## 200223270003_R06C01 0.91060615  0.2499229  0.8867975  0.3825995 0.06636174 0.03073269
## 200223270003_R07C01 0.89205942  0.2819622  0.5613799  0.7617081 0.68788639 0.84775699
## 200223270006_R01C01 0.47886249  0.2933539  0.9184122  0.8431126 0.06338988 0.83825789
## 200223270006_R04C01 0.02145248  0.2966623  0.9152514  0.7610084 0.91551446 0.45475291
##                     cg07480176 cg00696044 cg18819889 cg00689685 cg00675157 cg03660162
## 200223270003_R02C01  0.5171664 0.55608424  0.9156157  0.7019389  0.9188438  0.8691767
## 200223270003_R03C01  0.3760452 0.07552381  0.9004455  0.8634268  0.9242325  0.5160770
## 200223270003_R06C01  0.6998389 0.79270858  0.9054439  0.6378795  0.9254708  0.9026304
## 200223270003_R07C01  0.2189042 0.03548419  0.9089935  0.8624541  0.5447244  0.5305691
## 200223270006_R01C01  0.5570021 0.10714386  0.9065397  0.6361891  0.5173554  0.9257451
## 200223270006_R04C01  0.4501196 0.18420803  0.9242767  0.6356260  0.9247232  0.8935772
##                     cg10985055 cg07138269 cg21697769 cg08779649 cg01933473 cg17906851
## 200223270003_R02C01  0.8518169  0.5002290  0.8946108 0.44449401  0.2589014  0.9488392
## 200223270003_R03C01  0.8631895  0.9426707  0.2822953 0.45076825  0.6726133  0.9529718
## 200223270003_R06C01  0.5456633  0.5057781  0.8698740 0.04810217  0.2642560  0.6462151
## 200223270003_R07C01  0.8825100  0.9400527  0.9134887 0.42715969  0.1978068  0.9553497
## 200223270006_R01C01  0.8841690  0.9321602  0.2683820 0.89313476  0.7599441  0.6222117
## 200223270006_R04C01  0.8407797  0.9333501  0.2765740 0.59523771  0.7405661  0.6441202
##                     cg14307563 cg12776173 cg24851651 cg08584917 cg16788319 cg24506579
## 200223270003_R02C01  0.1855966 0.10388038 0.03674702  0.5663205  0.9379870  0.5244337
## 200223270003_R03C01  0.8916957 0.87306345 0.05358297  0.9019732  0.8913429  0.5794845
## 200223270003_R06C01  0.8750052 0.70094907 0.05968923  0.9187789  0.8680680  0.9427785
## 200223270003_R07C01  0.8975663 0.11367159 0.60864179  0.6007449  0.8811444  0.9323844
## 200223270006_R01C01  0.8762842 0.09458405 0.08825834  0.9069098  0.3123481  0.9185355
## 200223270006_R04C01  0.9168614 0.86532175 0.91932068  0.9263584  0.2995627  0.4332642
##                     cg01549082 cg12466610 cg15633912 cg01413796 cg20678988
## 200223270003_R02C01  0.2924138 0.05767659  0.1605530  0.1345128  0.8438718
## 200223270003_R03C01  0.7065693 0.59131778  0.9333421  0.2830672  0.8548886
## 200223270003_R06C01  0.2895440 0.06939623  0.8737362  0.8194681  0.7786685
## 200223270003_R07C01  0.6422955 0.04527733  0.9137334  0.9007710  0.8260541
## 200223270006_R01C01  0.8471236 0.05212904  0.9169706  0.2603027  0.3295384
## 200223270006_R04C01  0.6949888 0.05104033  0.8890004  0.9207672  0.8541667

9.2.2. Logistic Regression Model

9.2.2.1 Logistic Regression Model Training

df_LRM1<-processed_data 
featureName_LRM1<-AfterProcess_FeatureName
library(glmnet)
library(caret)

set.seed(123) 
trainIndex <- createDataPartition(df_LRM1$DX, p = 0.7, list = FALSE)
trainData <- df_LRM1[trainIndex, ]
testData <- df_LRM1[-trainIndex, ]
dim(trainData)
## [1] 455 156
dim(testData)
## [1] 193 156
ctrl <- trainControl(method = "cv", number = 5)

model_LRM1 <- caret::train(DX ~ ., data = trainData, method = "glmnet", trControl = ctrl)

predictions <- predict(model_LRM1, newdata = testData,type="raw")

cm_FeatEval_Median_LRM1<-caret::confusionMatrix(predictions, testData$DX)

print(cm_FeatEval_Median_LRM1)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia MCI
##   CN       46        7  14
##   Dementia  3       10   4
##   MCI      17       11  81
## 
## Overall Statistics
##                                           
##                Accuracy : 0.7098          
##                  95% CI : (0.6403, 0.7728)
##     No Information Rate : 0.513           
##     P-Value [Acc > NIR] : 2.018e-08       
##                                           
##                   Kappa : 0.4987          
##                                           
##  Mcnemar's Test P-Value : 0.1607          
## 
## Statistics by Class:
## 
##                      Class: CN Class: Dementia Class: MCI
## Sensitivity             0.6970         0.35714     0.8182
## Specificity             0.8346         0.95758     0.7021
## Pos Pred Value          0.6866         0.58824     0.7431
## Neg Pred Value          0.8413         0.89773     0.7857
## Prevalence              0.3420         0.14508     0.5130
## Detection Rate          0.2383         0.05181     0.4197
## Detection Prevalence    0.3472         0.08808     0.5648
## Balanced Accuracy       0.7658         0.65736     0.7602
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
cm_FeatEval_Median_LRM1_Accuracy <- cm_FeatEval_Median_LRM1$overall["Accuracy"]
cm_FeatEval_Median_LRM1_Kappa <- cm_FeatEval_Median_LRM1$overall["Kappa"]

print(cm_FeatEval_Median_LRM1_Accuracy)
##  Accuracy 
## 0.7098446
print(cm_FeatEval_Median_LRM1_Kappa)
##     Kappa 
## 0.4987013
print(model_LRM1)
## glmnet 
## 
## 455 samples
## 155 predictors
##   3 classes: 'CN', 'Dementia', 'MCI' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 364, 365, 363, 364, 364 
## Resampling results across tuning parameters:
## 
##   alpha  lambda        Accuracy   Kappa    
##   0.10   0.0001810831  0.6350263  0.3962356
##   0.10   0.0018108309  0.6460636  0.4102125
##   0.10   0.0181083090  0.6548792  0.4144240
##   0.55   0.0001810831  0.6263550  0.3765308
##   0.55   0.0018108309  0.6505792  0.4121576
##   0.55   0.0181083090  0.6483336  0.3870111
##   1.00   0.0001810831  0.6065010  0.3457739
##   1.00   0.0018108309  0.6394930  0.3907984
##   1.00   0.0181083090  0.5867925  0.2663062
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.01810831.
train_predictions <- predict(model_LRM1, newdata = trainData, type = "raw")

train_accuracy <- mean(train_predictions == trainData$DX)

FeatEval_Median_LRM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  0.96043956043956"
print(FeatEval_Median_LRM1_trainAccuracy)
## [1] 0.9604396
mean_accuracy_model_LRM1 <- mean(model_LRM1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM1)
## [1] 0.6326693
FeatEval_Median_mean_accuracy_cv_LRM1 <- mean_accuracy_model_LRM1
print(FeatEval_Median_mean_accuracy_cv_LRM1)
## [1] 0.6326693
library(caret)
library(pROC)
if (METHOD_FEATURE_FLAG ==5){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX,
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_LRM1_AUC <-auc_value
  print(roc_curve)
  print("The auc value is:")
  print(auc_value)
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==4 || METHOD_FEATURE_FLAG==6){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX,
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_LRM1_AUC <-auc_value
  print(roc_curve)
  print("The auc value is:")
  print(auc_value)
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==3){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX,
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_LRM1_AUC <-auc_value
  print(roc_curve)
  print("The auc value is:")
  print(auc_value)
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==1){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.8487
## The AUC value for class CN is: 0.8487235 
## 
## Class: Dementia 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.831
## The AUC value for class Dementia is: 0.8309524 
## 
## Class: MCI 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.819
## The AUC value for class MCI is: 0.8190415

if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Median_LRM1_AUC <-mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.8329058
print(FeatEval_Median_LRM1_AUC)
## [1] 0.8329058
importance_model_LRM1 <- varImp(model_LRM1)

print(importance_model_LRM1)
## glmnet variable importance
## 
##   variables are sorted by maximum importance across the classes
##   only 20 most important variables shown (out of 155)
## 
##                CN  Dementia    MCI
## PC1        90.425 1.000e+02  0.000
## PC2        46.685 7.872e+01  0.000
## PC3         5.951 0.000e+00 68.294
## cg00962106 63.062 1.183e+01 36.941
## cg02225060 23.025 1.263e+01 51.150
## cg14710850 49.622 8.389e+00 25.399
## cg27452255 49.062 1.787e+01 11.822
## cg02981548 26.231 5.636e+00 49.023
## cg08861434 48.675 0.000e+00 42.758
## cg19503462 25.904 4.810e+01  5.776
## cg07152869 27.981 4.673e+01  1.351
## cg16749614 11.548 1.797e+01 45.950
## cg05096415  1.408 4.492e+01 28.936
## cg23432430 44.232 3.494e+00 25.269
## cg17186592  3.085 4.200e+01 26.692
## cg00247094 15.876 4.167e+01 10.433
## cg09584650 41.425 6.526e+00 18.542
## cg11133939 24.211 4.538e-03 40.491
## cg16715186 39.196 7.692e+00 17.052
## cg03129555 12.445 3.860e+01  8.423
plot(importance_model_LRM1, top = 20, main = "Variable Importance Plot")

importance_model_LRM1_df<-importance_model_LRM1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6){
  
importance_final_model_LRM1 <- varImp(model_LRM1$finalModel)

library(dplyr)
ordered_importance_final_model_LRM1 <- importance_final_model_LRM1 %>% arrange(desc(Overall))

print(ordered_importance_final_model_LRM1)  
  
}
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_model_LRM1_df$Feature<-rownames(importance_model_LRM1_df)
  importance_model_LRM1_df <- importance_model_LRM1_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_model_LRM1_df)
  
}
##             CN     Dementia         MCI    Feature MaxImportance
## 1   90.4249959 1.000000e+02  0.00000000        PC1   100.0000000
## 2   46.6852511 7.871687e+01  0.00000000        PC2    78.7168695
## 3    5.9514284 0.000000e+00 68.29377031        PC3    68.2937703
## 4   63.0624779 1.183060e+01 36.94103318 cg00962106    63.0624779
## 5   23.0253134 1.262791e+01 51.15044480 cg02225060    51.1504448
## 6   49.6219760 8.389275e+00 25.39916547 cg14710850    49.6219760
## 7   49.0621672 1.787253e+01 11.82240946 cg27452255    49.0621672
## 8   26.2311272 5.636412e+00 49.02310927 cg02981548    49.0231093
## 9   48.6751550 0.000000e+00 42.75848513 cg08861434    48.6751550
## 10  25.9044959 4.810083e+01  5.77582047 cg19503462    48.1008290
## 11  27.9805710 4.672807e+01  1.35109756 cg07152869    46.7280747
## 12  11.5479104 1.797038e+01 45.95027254 cg16749614    45.9502725
## 13   1.4082456 4.491590e+01 28.93584840 cg05096415    44.9158984
## 14  44.2323996 3.493809e+00 25.26937151 cg23432430    44.2323996
## 15   3.0847630 4.200475e+01 26.69196564 cg17186592    42.0047521
## 16  15.8757027 4.166882e+01 10.43251038 cg00247094    41.6688243
## 17  41.4246158 6.526166e+00 18.54231378 cg09584650    41.4246158
## 18  24.2106288 4.537710e-03 40.49102060 cg11133939    40.4910206
## 19  39.1961911 7.691869e+00 17.05238456 cg16715186    39.1961911
## 20  12.4453101 3.860078e+01  8.42261474 cg03129555    38.6007810
## 21   3.1873741 2.010300e+01 38.48198122 cg08857872    38.4819812
## 22  12.1355897 3.683036e+01 11.12168078 cg06864789    36.8303605
## 23   0.0000000 3.529451e+01 26.74302862 cg14924512    35.2945137
## 24   7.2093862 1.187674e+01 34.91993102 cg16652920    34.9199310
## 25  19.1196758 3.458789e+01  0.00000000 cg03084184    34.5878874
## 26   3.6622134 1.336219e+01 34.16757960 cg26219488    34.1675796
## 27  13.4786533 3.380525e+01  6.06343454 cg20913114    33.8052491
## 28   7.1379682 3.346823e+01 11.81330708 cg06378561    33.4682333
## 29  33.3195658 1.548121e+01  2.09663452 cg26948066    33.3195658
## 30   0.5659275 3.328737e+01 17.47322453 cg25259265    33.2873719
## 31  33.2741078 0.000000e+00 21.54049071 cg06536614    33.2741078
## 32   1.6389436 3.231544e+01 17.24929833 cg24859648    32.3154384
## 33  12.7583748 3.077107e+01  2.19901369 cg12279734    30.7710702
## 34  30.6869142 1.115687e+01  2.49273174 cg03982462    30.6869142
## 35   1.2161924 3.061483e+01 16.61061643 cg05841700    30.6148318
## 36  29.8384323 7.653346e+00  7.72518875 cg11227702    29.8384323
## 37  25.3670810 0.000000e+00 29.02265454 cg12146221    29.0226545
## 38   9.6440880 8.953102e+00 28.93661162 cg02621446    28.9366116
## 39   0.0000000 2.259029e+01 28.84500791 cg00616572    28.8450079
## 40  28.4448781 8.986950e+00  6.53998672 cg15535896    28.4448781
## 41  25.4389044 0.000000e+00 28.23931149 cg02372404    28.2393115
## 42   5.0575037 2.778766e+01  8.14605297 cg09854620    27.7876601
## 43  27.5958249 0.000000e+00 15.87407881 cg04248279    27.5958249
## 44   4.0039872 7.702389e+00 27.54323251 cg20678988    27.5432325
## 45   0.0000000 2.751513e+01 13.83386590 cg24861747    27.5151349
## 46  27.4742117 1.566503e+01  0.00000000 cg10240127    27.4742117
## 47   7.7737177 7.231522e+00 27.22250002 cg16771215    27.2225000
## 48   0.6492323 2.697669e+01 14.65073407 cg01667144    26.9766923
## 49  26.9399272 8.941344e+00  2.81296798 cg13080267    26.9399272
## 50   0.0000000 2.614333e+01 26.59118756 cg02494911    26.5911876
## 51   9.3803376 2.645868e+01  5.12832350 cg10750306    26.4586835
## 52  25.4634769 1.206253e+00 11.27077718 cg11438323    25.4634769
## 53   4.8762323 4.031203e+00 25.42783468 cg06715136    25.4278347
## 54  25.1331048 0.000000e+00 15.36655565 cg04412904    25.1331048
## 55   4.7684737 2.484828e+01  5.39878099 cg12738248    24.8482839
## 56  24.4026477 0.000000e+00 18.67996447 cg03071582    24.4026477
## 57   0.0000000 2.429592e+01 15.80574961 cg05570109    24.2959228
## 58  24.2220488 2.028670e+01  0.00000000 cg15775217    24.2220488
## 59   0.0000000 1.995221e+01 24.19626839 cg24873924    24.1962684
## 60   7.5586975 4.158452e+00 24.13453309 cg17738613    24.1345331
## 61  23.8194963 0.000000e+00 20.82012346 cg01921484    23.8194963
## 62   0.0000000 1.631677e+01 23.68025333 cg10369879    23.6802533
## 63   0.0000000 1.839911e+01 23.65030836 cg27341708    23.6503084
## 64   0.0000000 2.355460e+01 21.43182704 cg12534577    23.5546047
## 65   0.0000000 2.341380e+01 17.83628431 cg18821122    23.4137998
## 66   4.6163643 6.919010e+00 23.35074707 cg12682323    23.3507471
## 67  23.3205645 0.000000e+00 14.18835420 cg05234269    23.3205645
## 68  23.0340417 0.000000e+00 22.81275180 cg20685672    23.0340417
## 69  20.3877963 0.000000e+00 22.84051946 cg12228670    22.8405195
## 70  22.7103929 3.661257e+00  8.33658882 cg11331837    22.7103929
## 71   0.0000000 2.269135e+01 20.85644106 cg01680303    22.6913512
## 72  22.4135654 1.167086e+00 10.22120185 cg17421046    22.4135654
## 73  22.2717800 8.042375e+00  2.25958376 cg03088219    22.2717800
## 74  22.2642367 1.930200e+01  0.00000000 cg00322003    22.2642367
## 75  22.2444520 1.528002e+01  0.00000000 cg02356645    22.2444520
## 76   5.8948426 2.207822e+01  1.26185244 cg01013522    22.0782243
## 77  12.6157774 0.000000e+00 21.83055417 cg00272795    21.8305542
## 78  21.6475067 0.000000e+00 14.53413956 cg25758034    21.6475067
## 79   4.7766387 2.163354e+01  1.18820728 cg26474732    21.6335393
## 80   0.0000000 2.128988e+01 17.62871803 cg16579946    21.2898785
## 81  21.2110158 4.532875e+00  5.64862793 cg11187460    21.2110158
## 82   9.6192362 2.120815e+01  0.00000000 cg07523188    21.2081474
## 83   0.0000000 1.703337e+01 20.79581060 cg14527649    20.7958106
## 84   2.7306253 4.862647e+00 20.54616679 cg20370184    20.5461668
## 85  20.5342917 0.000000e+00 13.71034162 cg17429539    20.5342917
## 86   0.0000000 2.028684e+01 10.03089297 cg20507276    20.2868418
## 87   1.1826772 6.821757e+00 20.19751808 cg13885788    20.1975181
## 88   0.0000000 1.557720e+01 20.08673687 cg16178271    20.0867369
## 89   5.5921343 1.527010e+00 19.98502402 cg10738648    19.9850240
## 90   5.1468761 1.991958e+01  2.75466494 cg26069044    19.9195759
## 91   3.1971419 4.954795e+00 19.79319869 cg25879395    19.7931987
## 92  19.6440926 0.000000e+00 12.12328511 cg06112204    19.6440926
## 93   3.2284298 1.923166e+01  1.27197851 cg23161429    19.2316573
## 94  19.0437246 0.000000e+00  8.87119081 cg25436480    19.0437246
## 95  18.8943565 1.898245e+01  0.00000000 cg26757229    18.9824479
## 96  18.8530531 8.147457e+00  0.00000000 cg02932958    18.8530531
## 97   6.3452415 1.863514e+01  0.95794695 cg18339359    18.6351383
## 98  18.5829115 1.503090e+00  1.89843568 cg06950937    18.5829115
## 99  12.0369279 1.857703e+01  0.00000000 cg23916408    18.5770326
## 100  1.5243862 3.188027e+00 18.16777827 cg12784167    18.1677783
## 101 11.8999547 0.000000e+00 18.13677901 cg07480176    18.1367790
## 102  0.0000000 5.486660e+00 17.70082105 cg15865722    17.7008211
## 103 17.6735632 0.000000e+00 13.05289022 cg27577781    17.6735632
## 104 17.1592611 2.949446e+00  2.52035333 cg05321907    17.1592611
## 105 16.8576564 0.000000e+00  7.58718214 cg03660162    16.8576564
## 106 16.7701657 0.000000e+00  9.90995624 cg07138269    16.7701657
## 107 16.7249896 6.150798e-04  5.47347172 cg20139683    16.7249896
## 108  1.5108482 1.661274e+01  3.60041267 cg12284872    16.6127427
## 109 16.5452643 0.000000e+00 15.32420554 cg03327352    16.5452643
## 110  0.0000000 1.652720e+01 12.91758315 cg23658987    16.5272039
## 111  0.0000000 1.475029e+01 16.17924038 cg21854924    16.1792404
## 112 15.7844996 0.000000e+00  6.83817946 cg21697769    15.7844996
## 113 15.6692581 5.743518e+00  0.00000000 cg19512141    15.6692581
## 114 10.3234169 0.000000e+00 15.46946157 cg08198851    15.4694616
## 115  0.4310200 1.509647e+01  0.82976761 cg00675157    15.0964673
## 116  0.0000000 5.686212e+00 15.01391492 cg01153376    15.0139149
## 117  1.8017259 1.495919e+01  0.76736950 cg01933473    14.9591899
## 118 14.9059930 0.000000e+00  4.57584668 cg12776173    14.9059930
## 119  0.0000000 1.067547e+01 14.71662168 cg14564293    14.7166217
## 120 12.4069328 0.000000e+00 14.57808091 cg24851651    14.5780809
## 121  0.0000000 1.452148e+01  2.25091078 cg22274273    14.5214828
## 122 12.7834657 1.451780e+01  0.00000000 cg25561557    14.5177981
## 123 13.7922467 1.439133e+01  0.00000000 cg21209485    14.3913274
## 124  3.8865187 1.430786e+01  0.00000000 cg10985055    14.3078613
## 125  8.0980649 0.000000e+00 14.23052412 cg14293999    14.2305241
## 126  0.0000000 6.088317e+00 13.97288400 cg18819889    13.9728840
## 127  7.9101725 1.390343e+01  0.00000000 cg24506579    13.9034342
## 128 10.4921193 0.000000e+00 13.81747243 cg19377607    13.8174724
## 129  2.6292166 1.360085e+01  0.00000000 cg06697310    13.6008462
## 130 13.5762704 0.000000e+00 10.14995156 cg00696044    13.5762704
## 131  0.0000000 0.000000e+00 13.11702130 cg01549082    13.1170213
## 132  0.0000000 6.905304e+00 13.06761511 cg01128042    13.0676151
## 133  0.2698545 1.248616e+01  1.16140010 cg00999469    12.4861632
## 134  0.0000000 1.079484e+01 12.40421487 cg06118351    12.4042149
## 135  0.0000000 1.124273e+01 11.78794942 cg12012426    11.7879494
## 136 11.7349496 9.445909e+00  0.00000000 cg08584917    11.7349496
## 137  0.0000000 1.168263e+01  2.24819851 cg15633912    11.6826262
## 138 11.6820808 0.000000e+00 11.18100448 cg27272246    11.6820808
## 139 11.3463436 1.979374e+00  0.00000000 cg17906851    11.3463436
## 140  1.2024457 1.133501e+01  0.00000000 cg16788319    11.3350060
## 141  8.9965158 0.000000e+00 11.28265946 cg07028768    11.2826595
## 142  0.0000000 3.118110e+00 10.75375455 cg27086157    10.7537545
## 143  1.8118794 9.619556e+00  0.00000000 cg14240646     9.6195558
## 144  0.0000000 9.458924e+00  9.21780985 cg00154902     9.4589241
## 145  6.6601294 0.000000e+00  9.11035233 cg14307563     9.1103523
## 146  0.0000000 8.513866e+00  0.00000000 cg02320265     8.5138660
## 147  8.2069811 0.000000e+00  7.04448942 cg08779649     8.2069811
## 148  7.6741533 0.000000e+00  7.94681309 cg04664583     7.9468131
## 149  0.0000000 0.000000e+00  6.60014051 cg12466610     6.6001405
## 150  6.2362459 3.714491e+00  0.00000000 cg27639199     6.2362459
## 151  0.0000000 0.000000e+00  5.82266885 cg15501526     5.8226689
## 152  0.0000000 4.835409e+00  3.66050252 cg00689685     4.8354086
## 153  2.8005491 0.000000e+00  0.07693353 cg01413796     2.8005491
## 154  0.0000000 0.000000e+00  2.13030107 cg11247378     2.1303011
## 155  0.5215519 0.000000e+00  0.63597308    age.now     0.6359731
if (!require(reshape2)) {
  install.packages("reshape2")
  library(reshape2)
} else {
  library(reshape2)
}

if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_LRM1_df <- importance_model_LRM1_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.

if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_model_LRM1_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_model_LRM1_df,n=20)$Feature)
  importance_melted_LRM1_df <- importance_model_LRM1_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
##           CN     Dementia       MCI    Feature MaxImportance
## 1  90.424996 100.00000000  0.000000        PC1     100.00000
## 2  46.685251  78.71686950  0.000000        PC2      78.71687
## 3   5.951428   0.00000000 68.293770        PC3      68.29377
## 4  63.062478  11.83059646 36.941033 cg00962106      63.06248
## 5  23.025313  12.62790827 51.150445 cg02225060      51.15044
## 6  49.621976   8.38927495 25.399165 cg14710850      49.62198
## 7  49.062167  17.87253273 11.822409 cg27452255      49.06217
## 8  26.231127   5.63641231 49.023109 cg02981548      49.02311
## 9  48.675155   0.00000000 42.758485 cg08861434      48.67516
## 10 25.904496  48.10082896  5.775820 cg19503462      48.10083
## 11 27.980571  46.72807471  1.351098 cg07152869      46.72807
## 12 11.547910  17.97037830 45.950273 cg16749614      45.95027
## 13  1.408246  44.91589839 28.935848 cg05096415      44.91590
## 14 44.232400   3.49380894 25.269372 cg23432430      44.23240
## 15  3.084763  42.00475211 26.691966 cg17186592      42.00475
## 16 15.875703  41.66882434 10.432510 cg00247094      41.66882
## 17 41.424616   6.52616573 18.542314 cg09584650      41.42462
## 18 24.210629   0.00453771 40.491021 cg11133939      40.49102
## 19 39.196191   7.69186882 17.052385 cg16715186      39.19619
## 20 12.445310  38.60078097  8.422615 cg03129555      38.60078
## [1] "the top 20 features based on max way:"
##  [1] "PC1"        "PC2"        "PC3"        "cg00962106" "cg02225060" "cg14710850" "cg27452255"
##  [8] "cg02981548" "cg08861434" "cg19503462" "cg07152869" "cg16749614" "cg05096415" "cg23432430"
## [15] "cg17186592" "cg00247094" "cg09584650" "cg11133939" "cg16715186" "cg03129555"
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.

9.2.2.2 Model Diagnose & Improve

9.2.2.2.1 Class imbalance
Class imbalance Check
  • Let’s plot the distribution of “DX” using a bar plot.
table(df_LRM1$DX)
## 
##       CN Dementia      MCI 
##      221       94      333
prop.table(table(df_LRM1$DX))
## 
##        CN  Dementia       MCI 
## 0.3410494 0.1450617 0.5138889
table(trainData$DX)
## 
##       CN Dementia      MCI 
##      155       66      234
prop.table(table(trainData$DX))
## 
##        CN  Dementia       MCI 
## 0.3406593 0.1450549 0.5142857
barplot(table(df_LRM1$DX), main = "Whole Data Class Distribution")

For the training Data set:

barplot(table(trainData$DX), main = "Train Data Class Distribution")

  • Let’s calculate the imbalance ratio, which is the ratio of the number of samples in the majority class to the number of samples in the minority class. severe class imbalance will be indicated by high ratio.

    class_counts <- table(df_LRM1$DX)
    imbalance_ratio <- max(class_counts) / min(class_counts)
    print("The imbalance radio of the whole data set is:")
    ## [1] "The imbalance radio of the whole data set is:"
    print(imbalance_ratio)
    ## [1] 3.542553
    class_counts <- table(trainData$DX)
    imbalance_ratio <- max(class_counts) / min(class_counts)
    print("The imbalance radio of the training data set is:")
    ## [1] "The imbalance radio of the training data set is:"
    print(imbalance_ratio)
    ## [1] 3.545455
  • Let’s do Chi-square test which could determine if the class distribution significantly deviates from a balanced distribution. The p-value provided by the test will indicate the significance of class imbalance.

    chisq.test(table(df_LRM1$DX))
    ## 
    ##  Chi-squared test for given probabilities
    ## 
    ## data:  table(df_LRM1$DX)
    ## X-squared = 132.4, df = 2, p-value < 2.2e-16
    chisq.test(table(trainData$DX))
    ## 
    ##  Chi-squared test for given probabilities
    ## 
    ## data:  table(trainData$DX)
    ## X-squared = 93.156, df = 2, p-value < 2.2e-16
Solve Class imbalance use “SMOTE” (NOT OK YET, MAY NEED FURTHER IMPROVE)
library(smotefamily)

smote_data_LGR_1 <- SMOTE(X = trainData[, !names(trainData) %in% "DX"], target = trainData$DX, K = 5, dup_size = 1)
balanced_data_LGR_1 <- smote_data_LGR_1$data
colnames(balanced_data_LGR_1)[colnames(balanced_data_LGR_1) == "class"] <- "DX"
table(balanced_data_LGR_1$DX)
## 
##       CN Dementia      MCI 
##      155      132      234
dim(balanced_data_LGR_1)
## [1] 521 156
Fit Model with Balanced Data
ctrl <- trainControl(method = "cv", number = 5)

model_LRM2 <- caret::train(DX ~ ., data = balanced_data_LGR_1, method = "glmnet", trControl = ctrl)

predictions <- predict(model_LRM2, newdata = testData)
caret::confusionMatrix(predictions, testData$DX)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia MCI
##   CN       45        6  15
##   Dementia  4       11   6
##   MCI      17       11  78
## 
## Overall Statistics
##                                           
##                Accuracy : 0.6943          
##                  95% CI : (0.6241, 0.7584)
##     No Information Rate : 0.513           
##     P-Value [Acc > NIR] : 2.356e-07       
##                                           
##                   Kappa : 0.4779          
##                                           
##  Mcnemar's Test P-Value : 0.5733          
## 
## Statistics by Class:
## 
##                      Class: CN Class: Dementia Class: MCI
## Sensitivity             0.6818         0.39286     0.7879
## Specificity             0.8346         0.93939     0.7021
## Pos Pred Value          0.6818         0.52381     0.7358
## Neg Pred Value          0.8346         0.90116     0.7586
## Prevalence              0.3420         0.14508     0.5130
## Detection Rate          0.2332         0.05699     0.4041
## Detection Prevalence    0.3420         0.10881     0.5492
## Balanced Accuracy       0.7582         0.66613     0.7450
print(model_LRM2)
## glmnet 
## 
## 521 samples
## 155 predictors
##   3 classes: 'CN', 'Dementia', 'MCI' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 416, 417, 417, 417, 417 
## Resampling results across tuning parameters:
## 
##   alpha  lambda       Accuracy   Kappa    
##   0.10   0.000186946  0.7103114  0.5552305
##   0.10   0.001869460  0.7121978  0.5563269
##   0.10   0.018694597  0.7160989  0.5621857
##   0.55   0.000186946  0.7007143  0.5397047
##   0.55   0.001869460  0.7102930  0.5525186
##   0.55   0.018694597  0.6872894  0.5142517
##   1.00   0.000186946  0.6834432  0.5136505
##   1.00   0.001869460  0.7006777  0.5383593
##   1.00   0.018694597  0.6468864  0.4489232
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.0186946.
train_predictions <- predict(model_LRM2, newdata = trainData, type = "raw")

train_accuracy <- mean(train_predictions == trainData$DX)


print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  0.958241758241758"
mean_accuracy_model_LRM2 <- mean(model_LRM2$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM2)
## [1] 0.6964347
importance_model_LRM2 <- varImp(model_LRM2)

print(importance_model_LRM2)
## glmnet variable importance
## 
##   variables are sorted by maximum importance across the classes
##   only 20 most important variables shown (out of 155)
## 
##                CN Dementia    MCI
## PC1        80.679  100.000  0.000
## PC2        38.884   80.691  0.000
## cg00962106 56.198    9.095 33.495
## PC3         7.501    0.000 55.870
## cg19503462 26.315   48.639  6.536
## cg27452255 47.903   21.179  8.088
## cg07152869 27.968   45.986  1.294
## cg05096415  3.337   45.587 28.318
## cg02225060 18.272   12.775 45.585
## cg14710850 45.324    8.651 21.701
## cg02981548 23.097    5.920 45.302
## cg08861434 44.863    0.000 36.602
## cg03129555 14.450   42.015 10.562
## cg23432430 41.988    6.875 20.297
## cg16749614  8.925   17.010 41.737
## cg17186592  3.590   40.130 25.168
## cg14924512  1.855   38.982 23.221
## cg09584650 38.240    7.576 15.082
## cg06864789 13.558   38.083 11.893
## cg03084184 19.822   37.834  3.055
plot(importance_model_LRM2, top = 20, main = "Variable Importance Plot")

importance_model_LRM2_df<-importance_model_LRM2$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4|| METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6){
  
importance_final_model_LRM2 <- varImp(model_LRM2$finalModel)

library(dplyr)
ordered_importance_final_model_LRM2 <- importance_final_model_LRM2 %>% arrange(desc(Overall))

print(ordered_importance_final_model_LRM2)  
  
}
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_model_LRM2_df$Feature<-rownames(importance_model_LRM2_df)
  importance_model_LRM2_df <- importance_model_LRM2_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_model_LRM2_df)
  
}
##               CN     Dementia          MCI    Feature MaxImportance
## 1   80.679428863 100.00000000  0.000000000        PC1   100.0000000
## 2   38.883792330  80.69142135  0.000000000        PC2    80.6914213
## 3   56.198105473   9.09453097 33.494895269 cg00962106    56.1981055
## 4    7.501419843   0.00000000 55.870041073        PC3    55.8700411
## 5   26.314966715  48.63888581  6.535588437 cg19503462    48.6388858
## 6   47.903297796  21.17904018  8.087900003 cg27452255    47.9032978
## 7   27.968017421  45.98633975  1.294250776 cg07152869    45.9863398
## 8    3.336642534  45.58715847 28.317868132 cg05096415    45.5871585
## 9   18.272375469  12.77464831 45.585242833 cg02225060    45.5852428
## 10  45.324125992   8.65147591 21.700847291 cg14710850    45.3241260
## 11  23.097249751   5.92030775 45.302133511 cg02981548    45.3021335
## 12  44.863400323   0.00000000 36.602318185 cg08861434    44.8634003
## 13  14.450136041  42.01469779 10.561940960 cg03129555    42.0146978
## 14  41.988198265   6.87510878 20.296659603 cg23432430    41.9881983
## 15   8.925155758  17.01038504 41.736552660 cg16749614    41.7365527
## 16   3.590290791  40.13003730 25.168269158 cg17186592    40.1300373
## 17   1.854949730  38.98245410 23.221432408 cg14924512    38.9824541
## 18  38.239988922   7.57626268 15.081591010 cg09584650    38.2399889
## 19  13.558155574  38.08328750 11.893308719 cg06864789    38.0832875
## 20  19.822087774  37.83431155  3.055188849 cg03084184    37.8343116
## 21  21.496100293   0.52009240 37.528929069 cg11133939    37.5289291
## 22  13.593984894  37.18964818  9.117936907 cg00247094    37.1896482
## 23   0.535099440  20.67938057 35.715187958 cg08857872    35.7151880
## 24  35.480845254   7.95874622 14.041837327 cg16715186    35.4808453
## 25   4.934937153  35.04718906 17.442152969 cg24859648    35.0471891
## 26  14.089726349  34.55372136  5.433977699 cg12279734    34.5537214
## 27   1.727162103  34.09962713 18.442895227 cg25259265    34.0996271
## 28   8.425461156  34.05545001 11.634704145 cg06378561    34.0554500
## 29   2.317735890  13.36092372 31.981929900 cg26219488    31.9819299
## 30  12.464527158  31.58726203  5.781375790 cg20913114    31.5872620
## 31   5.482910493  11.24716578 31.373100048 cg16652920    31.3731000
## 32   1.406006839  30.96767657 17.376552573 cg05841700    30.9676766
## 33  29.671776171  14.07606681  0.800915538 cg26948066    29.6717762
## 34  28.733576534  12.27805126  0.034366374 cg03982462    28.7335765
## 35  28.251208564   8.08782117  6.647490164 cg11227702    28.2512086
## 36   6.457033630  28.04864581  8.136034298 cg09854620    28.0486458
## 37  27.485718562   0.00000000 21.536098995 cg06536614    27.4857186
## 38   7.543830482   9.69486687 27.094786126 cg02621446    27.0947861
## 39   0.000000000  26.98547093 24.131491853 cg02494911    26.9854709
## 40  20.458258549   0.00000000 26.629992272 cg12146221    26.6299923
## 41   0.000000000  25.78694507 26.611100611 cg00616572    26.6111006
## 42   9.536723526  26.42939213  5.651344888 cg10750306    26.4293921
## 43  26.168568224   7.88086972  6.037506608 cg15535896    26.1685682
## 44   1.141041466  25.93481794 13.648772957 cg01667144    25.9348179
## 45   0.000000000  25.62797982 13.464484954 cg24861747    25.6279798
## 46  25.545489329  15.10535058  0.000000000 cg10240127    25.5454893
## 47  24.102526327   0.00000000 25.136745728 cg02372404    25.1367457
## 48   1.110369810   8.19438850 25.058064292 cg06715136    25.0580643
## 49  24.827775106   0.00000000 16.155963858 cg20685672    24.8277751
## 50   0.000000000  24.76952655 14.635617490 cg05570109    24.7695265
## 51  24.719596431   0.00000000 13.464185675 cg04248279    24.7195964
## 52   4.027296440   5.52070774 24.339307078 cg20678988    24.3393071
## 53   0.000000000  24.19961748 18.409106406 cg12534577    24.1996175
## 54   0.000000000  24.15328295 15.846410021 cg16579946    24.1532829
## 55   4.826664207  24.12335487  5.714614685 cg12738248    24.1233549
## 56   6.533547554   5.92725750 24.066141752 cg16771215    24.0661418
## 57  23.998797144  10.16532801  0.028545462 cg13080267    23.9987971
## 58   5.506607573   5.67362617 23.067400417 cg17738613    23.0674004
## 59  22.320443120   6.53640040  5.659119155 cg11331837    22.3204431
## 60   0.000000000  22.28339674 17.216096991 cg01680303    22.2833967
## 61  22.210310684   0.00000000 13.206930759 cg04412904    22.2103107
## 62   0.000000000  22.06548677 14.956522108 cg18821122    22.0654868
## 63   3.420024153   7.31988987 22.051844895 cg12682323    22.0518449
## 64  22.042423499  16.24236556  0.000000000 cg02356645    22.0424235
## 65   0.000000000  20.83329334 22.027588678 cg24873924    22.0275887
## 66   0.000000000  15.83377891 21.998528622 cg10369879    21.9985286
## 67   6.480617738  21.72590957  0.939133148 cg01013522    21.7259096
## 68  16.495721453   0.00000000 21.583480114 cg12228670    21.5834801
## 69   7.519511314  21.11628056  0.000000000 cg07523188    21.1162806
## 70  21.103724796  18.09207717  0.000000000 cg15775217    21.1037248
## 71  20.985180608   0.00000000 16.898707961 cg03071582    20.9851806
## 72  20.943857120   0.00000000 12.124679760 cg05234269    20.9438571
## 73   0.000000000  20.89510385  7.918854390 cg20507276    20.8951039
## 74   0.000000000  19.10810611 20.829416683 cg27341708    20.8294167
## 75  13.165537730  20.45343113  0.000000000 cg25561557    20.4534311
## 76  20.436701259   8.86519380  0.349448845 cg03088219    20.4367013
## 77  20.387453896   0.00000000 19.555842902 cg01921484    20.3874539
## 78   4.715766302  20.18828004  4.199165451 cg26069044    20.1882800
## 79  20.128075536   0.00000000  7.556981121 cg06112204    20.1280755
## 80  20.076550652   0.00000000 10.293594056 cg25758034    20.0765507
## 81  20.065778545   0.22748177  9.400207555 cg17421046    20.0657785
## 82  19.735141486   0.00000000  9.881499267 cg17429539    19.7351415
## 83  19.731674390   0.00000000 12.798875580 cg11438323    19.7316744
## 84  19.532262668  14.86086455  0.000000000 cg00322003    19.5322627
## 85  19.322487367   4.15331038  4.746838336 cg11187460    19.3224874
## 86   2.510648117   5.41891238 18.970344056 cg25879395    18.9703441
## 87   4.055575121  18.84551312  0.228938987 cg26474732    18.8455131
## 88   2.893715763  18.78832756  2.430352370 cg23161429    18.7883276
## 89   1.682911789   4.78952445 18.695452533 cg20370184    18.6954525
## 90  18.641868330   0.02258014  6.337732939 cg25436480    18.6418683
## 91   0.009327755   7.64286248 18.625807773 cg13885788    18.6258078
## 92  11.435914221  18.25338438  0.000000000 cg23916408    18.2533844
## 93   0.000000000  16.67258119 18.154192689 cg14527649    18.1541927
## 94   5.003807356   1.01079493 18.046203985 cg10738648    18.0462040
## 95   0.000000000  17.96700771 12.794428811 cg23658987    17.9670077
## 96   5.991551785  17.95117220  1.290455798 cg18339359    17.9511722
## 97  10.240858920   0.00000000 17.847594160 cg07480176    17.8475942
## 98  16.799265388  17.79043659  0.000000000 cg26757229    17.7904366
## 99   2.974513464  17.78047141  4.060710433 cg12284872    17.7804714
## 100  8.047755488  17.46985680  0.000000000 cg24506579    17.4698568
## 101 17.448328611   8.51043165  0.000000000 cg02932958    17.4483286
## 102 13.323326679   0.00000000 17.355723517 cg00272795    17.3557235
## 103  0.000000000   7.44854412 17.195823649 cg12784167    17.1958236
## 104 16.752406581   0.00000000  6.655601915 cg03660162    16.7524066
## 105  0.000000000  16.01388121 16.463252853 cg16178271    16.4632529
## 106 16.360904095   0.00000000 11.982490178 cg27577781    16.3609041
## 107 16.148865611   0.00000000  8.265317801 cg07138269    16.1488656
## 108 15.970109102   2.88564640  2.056078529 cg05321907    15.9701091
## 109  0.758550200  15.68809089  2.141928397 cg22274273    15.6880909
## 110  0.469063822   3.15344245 15.547390512 cg15865722    15.5473905
## 111 13.420379863  15.52876319  0.000000000 cg21209485    15.5287632
## 112 15.462630452   0.63364112  3.697728095 cg20139683    15.4626305
## 113  0.805213233  15.27248433  2.251238780 cg15633912    15.2724843
## 114  1.781942955  15.20891238  0.499178354 cg00675157    15.2089124
## 115  0.000000000  15.01349056 13.725064390 cg21854924    15.0134906
## 116  0.000000000   8.30445801 14.977088715 cg14564293    14.9770887
## 117  1.414740021  14.67400925  1.624599344 cg01933473    14.6740093
## 118 14.358410506   0.00000000  2.353605554 cg06950937    14.3584105
## 119  7.036160578   0.00000000 14.260960723 cg14293999    14.2609607
## 120  0.000000000   7.62763969 14.099430071 cg01128042    14.0994301
## 121 13.967034375   0.00000000  2.023110140 cg12776173    13.9670344
## 122 13.960184792   0.00000000 13.905326596 cg03327352    13.9601848
## 123  8.333769356   0.00000000 13.928550552 cg24851651    13.9285506
## 124 13.708623191   0.00000000  7.305663059 cg00696044    13.7086232
## 125  8.532678036   0.00000000 13.700028272 cg19377607    13.7000283
## 126  0.000000000   2.79030342 13.616930212 cg01153376    13.6169302
## 127 13.578106407   3.86934109  0.000000000 cg19512141    13.5781064
## 128  0.000000000   6.31061501 13.528579969 cg18819889    13.5285800
## 129  8.860694660   0.00000000 13.136946476 cg27272246    13.1369465
## 130 12.221081080   0.00000000 12.990255691 cg08198851    12.9902557
## 131  0.000000000   9.82358796 12.685100159 cg06118351    12.6851002
## 132  4.058933776  12.40341568  0.000000000 cg10985055    12.4034157
## 133  0.930376494  11.76495324  0.005222529 cg16788319    11.7649532
## 134  1.061079854  11.75211794  0.000000000 cg14240646    11.7521179
## 135  0.794740956  11.57126294  0.396509276 cg00999469    11.5712629
## 136  0.000000000  11.34774657 10.931613703 cg12012426    11.3477466
## 137  0.000000000   2.68390094 10.896653093 cg01549082    10.8966531
## 138 10.744293417   0.00000000  9.157358594 cg21697769    10.7442934
## 139 10.665484601   0.00000000  7.591554795 cg07028768    10.6654846
## 140 10.323399632   3.96769044  0.000000000 cg17906851    10.3233996
## 141  0.000000000   8.37319093  9.807447434 cg27086157     9.8074474
## 142  9.745983813   9.21937779  0.000000000 cg08584917     9.7459838
## 143  0.310904240   9.73868693  0.000000000 cg06697310     9.7386869
## 144  0.601000492   9.52007028  0.000000000 cg02320265     9.5200703
## 145  2.508816376   0.00000000  9.494379581 cg04664583     9.4943796
## 146  4.880776384   0.00000000  8.718261699 cg14307563     8.7182617
## 147  6.238606499   0.00000000  8.443705870 cg08779649     8.4437059
## 148  0.000000000   6.06930373  7.360511721 cg00154902     7.3605117
## 149  0.000000000   0.00000000  6.388083312 cg12466610     6.3880833
## 150  6.342855658   4.10143474  0.000000000 cg27639199     6.3428557
## 151  0.000000000   5.86185727  4.806007444 cg00689685     5.8618573
## 152  0.000000000   2.96250768  5.177422179 cg15501526     5.1774222
## 153  2.837884424   0.00000000  0.000000000 cg01413796     2.8378844
## 154  0.421228002   0.00000000  0.566051449    age.now     0.5660514
## 155  0.000000000   0.43807821  0.040984280 cg11247378     0.4380782
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_LRM2_df <- importance_model_LRM2_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM2_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.

if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_model_LRM2_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_model_LRM2_df,n=20)$Feature)
  
  importance_melted_LRM2_df <- importance_model_LRM2_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM2_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
##           CN   Dementia       MCI    Feature MaxImportance
## 1  80.679429 100.000000  0.000000        PC1     100.00000
## 2  38.883792  80.691421  0.000000        PC2      80.69142
## 3  56.198105   9.094531 33.494895 cg00962106      56.19811
## 4   7.501420   0.000000 55.870041        PC3      55.87004
## 5  26.314967  48.638886  6.535588 cg19503462      48.63889
## 6  47.903298  21.179040  8.087900 cg27452255      47.90330
## 7  27.968017  45.986340  1.294251 cg07152869      45.98634
## 8   3.336643  45.587158 28.317868 cg05096415      45.58716
## 9  18.272375  12.774648 45.585243 cg02225060      45.58524
## 10 45.324126   8.651476 21.700847 cg14710850      45.32413
## 11 23.097250   5.920308 45.302134 cg02981548      45.30213
## 12 44.863400   0.000000 36.602318 cg08861434      44.86340
## 13 14.450136  42.014698 10.561941 cg03129555      42.01470
## 14 41.988198   6.875109 20.296660 cg23432430      41.98820
## 15  8.925156  17.010385 41.736553 cg16749614      41.73655
## 16  3.590291  40.130037 25.168269 cg17186592      40.13004
## 17  1.854950  38.982454 23.221432 cg14924512      38.98245
## 18 38.239989   7.576263 15.081591 cg09584650      38.23999
## 19 13.558156  38.083288 11.893309 cg06864789      38.08329
## 20 19.822088  37.834312  3.055189 cg03084184      37.83431
## [1] "the top 20 features based on max way:"
##  [1] "PC1"        "PC2"        "cg00962106" "PC3"        "cg19503462" "cg27452255" "cg07152869"
##  [8] "cg05096415" "cg02225060" "cg14710850" "cg02981548" "cg08861434" "cg03129555" "cg23432430"
## [15] "cg16749614" "cg17186592" "cg14924512" "cg09584650" "cg06864789" "cg03084184"
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.

if(METHOD_FEATURE_FLAG == 5){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX, prob_predictions[, "MCI"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX, prob_predictions[, "Dementia"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX, prob_predictions[, "CI"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.8505
## The AUC value for class CN is: 0.850513 
## 
## Class: Dementia 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.8357
## The AUC value for class Dementia is: 0.8357143 
## 
## Class: MCI 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.8188
## The AUC value for class MCI is: 0.8188266

if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
}
## The mean AUC value across all classes with one versus rest method is: 0.835018
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
}
## The mean AUC value across all classes with one versus rest method is: 0.835018

9.2.3. Elastic Net

9.2.3.1 Elastic Net Model Training

df_ENM1<-processed_data 
featureName_ENM1<-AfterProcess_FeatureName
library(caret)
set.seed(123)
trainIndex <- createDataPartition(df_ENM1$DX, p = 0.7, list = FALSE)
trainData_ENM1 <- df_ENM1[trainIndex, ]
testData_ENM1 <- df_ENM1[-trainIndex, ]
ctrl <- trainControl(method = "cv", number = 5)

param_grid <- expand.grid(alpha = 0:1, lambda = seq(0.001, 1, length = 20))

elastic_net_model1 <- caret::train(DX ~ ., data = trainData_ENM1, method = "glmnet",
                           trControl = ctrl, tuneGrid = param_grid)

print(elastic_net_model1)
## glmnet 
## 
## 455 samples
## 155 predictors
##   3 classes: 'CN', 'Dementia', 'MCI' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 364, 365, 363, 364, 364 
## Resampling results across tuning parameters:
## 
##   alpha  lambda      Accuracy   Kappa     
##   0      0.00100000  0.6571736  0.42345797
##   0      0.05357895  0.6725349  0.43439423
##   0      0.10615789  0.6747338  0.43094148
##   0      0.15873684  0.6725599  0.42391171
##   0      0.21131579  0.6725837  0.41818370
##   0      0.26389474  0.6770526  0.42406079
##   0      0.31647368  0.6769804  0.41856449
##   0      0.36905263  0.6726087  0.40853473
##   0      0.42163158  0.6638170  0.38542265
##   0      0.47421053  0.6660148  0.38902178
##   0      0.52678947  0.6594214  0.37628816
##   0      0.57936842  0.6550252  0.36510400
##   0      0.63194737  0.6528274  0.35927177
##   0      0.68452632  0.6418618  0.33471759
##   0      0.73710526  0.6352200  0.31832804
##   0      0.78968421  0.6307756  0.30720022
##   0      0.84226316  0.6263800  0.29777058
##   0      0.89484211  0.6220322  0.28739881
##   0      0.94742105  0.6220322  0.28739881
##   0      1.00000000  0.6220322  0.28682520
##   1      0.00100000  0.6240596  0.37352512
##   1      0.05357895  0.5187546  0.05457313
##   1      0.10615789  0.5142862  0.00000000
##   1      0.15873684  0.5142862  0.00000000
##   1      0.21131579  0.5142862  0.00000000
##   1      0.26389474  0.5142862  0.00000000
##   1      0.31647368  0.5142862  0.00000000
##   1      0.36905263  0.5142862  0.00000000
##   1      0.42163158  0.5142862  0.00000000
##   1      0.47421053  0.5142862  0.00000000
##   1      0.52678947  0.5142862  0.00000000
##   1      0.57936842  0.5142862  0.00000000
##   1      0.63194737  0.5142862  0.00000000
##   1      0.68452632  0.5142862  0.00000000
##   1      0.73710526  0.5142862  0.00000000
##   1      0.78968421  0.5142862  0.00000000
##   1      0.84226316  0.5142862  0.00000000
##   1      0.89484211  0.5142862  0.00000000
##   1      0.94742105  0.5142862  0.00000000
##   1      1.00000000  0.5142862  0.00000000
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0 and lambda = 0.2638947.
mean_accuracy_elastic_net_model1 <- mean(elastic_net_model1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_elastic_net_model1)
## [1] 0.5868408
FeatEval_Median_mean_accuracy_cv_ENM1<-mean_accuracy_elastic_net_model1
print(FeatEval_Median_mean_accuracy_cv_ENM1)
## [1] 0.5868408
train_predictions <- predict(elastic_net_model1, newdata = trainData, type = "raw")


train_accuracy <- mean(train_predictions == trainData_ENM1$DX)


FeatEval_Median_ENM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  0.863736263736264"
print(FeatEval_Median_ENM1_trainAccuracy)
## [1] 0.8637363
predictions <- predict(elastic_net_model1, newdata = testData_ENM1)
cm_FeatEval_Median_ENM1 <- caret::confusionMatrix(predictions,testData_ENM1$DX)
print(cm_FeatEval_Median_ENM1)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia MCI
##   CN       45        5  13
##   Dementia  0        8   0
##   MCI      21       15  86
## 
## Overall Statistics
##                                           
##                Accuracy : 0.7202          
##                  95% CI : (0.6512, 0.7823)
##     No Information Rate : 0.513           
##     P-Value [Acc > NIR] : 3.473e-09       
##                                           
##                   Kappa : 0.4987          
##                                           
##  Mcnemar's Test P-Value : 6.901e-05       
## 
## Statistics by Class:
## 
##                      Class: CN Class: Dementia Class: MCI
## Sensitivity             0.6818         0.28571     0.8687
## Specificity             0.8583         1.00000     0.6170
## Pos Pred Value          0.7143         1.00000     0.7049
## Neg Pred Value          0.8385         0.89189     0.8169
## Prevalence              0.3420         0.14508     0.5130
## Detection Rate          0.2332         0.04145     0.4456
## Detection Prevalence    0.3264         0.04145     0.6321
## Balanced Accuracy       0.7700         0.64286     0.7429
cm_FeatEval_Median_ENM1_Accuracy<-cm_FeatEval_Median_ENM1$overall["Accuracy"]
cm_FeatEval_Median_ENM1_Kappa<-cm_FeatEval_Median_ENM1$overall["Kappa"]
print(cm_FeatEval_Median_ENM1_Accuracy)
##  Accuracy 
## 0.7202073
print(cm_FeatEval_Median_ENM1_Kappa)
##     Kappa 
## 0.4986772
importance_elastic_net_model1<- varImp(elastic_net_model1)


print(importance_elastic_net_model1)
## glmnet variable importance
## 
##   variables are sorted by maximum importance across the classes
##   only 20 most important variables shown (out of 155)
## 
##               CN Dementia    MCI
## PC1        86.62  100.000 13.317
## PC2        68.41   88.617 20.144
## cg00962106 72.96   12.359 60.542
## cg02225060 43.13   18.831 62.028
## cg02981548 49.96    8.975 59.003
## cg23432430 57.29   15.755 41.467
## cg14710850 54.50    8.365 46.074
## cg16749614 20.68   33.681 54.423
## cg07152869 48.29   54.287  5.935
## cg08857872 29.00   24.415 53.478
## cg16652920 27.03   25.381 52.479
## cg26948066 51.16   42.093  9.005
## PC3        12.10   38.688 50.850
## cg08861434 48.60    1.034 49.702
## cg27452255 49.50   29.759 19.674
## cg09584650 48.11   20.547 27.502
## cg11133939 31.91   15.802 47.780
## cg19503462 47.24   44.920  2.255
## cg06864789 20.57   46.482 25.848
## cg02372404 30.74   14.685 45.487
plot(importance_elastic_net_model1, top = 20, main = "Variable Importance Plot")

importance_elastic_net_model1_df<-importance_elastic_net_model1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG ==4 ||  METHOD_FEATURE_FLAG==5 ||METHOD_FEATURE_FLAG==6 ){
  
  
importance_elastic_net_final_model1 <- varImp(elastic_net_model1$finalModel)

library(dplyr)

Ordered_importance_elastic_net_final_model1 <- importance_elastic_net_final_model1 %>% arrange(desc(Overall))


print(Ordered_importance_elastic_net_final_model1) 
  
}
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_elastic_net_model1_df$Feature<-rownames(importance_elastic_net_model1_df)
  importance_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))


  print(importance_elastic_net_model1_df)
  
}
##              CN     Dementia        MCI    Feature MaxImportance
## 1   86.61856914 1.000000e+02 13.3172756        PC1   100.0000000
## 2   68.40923356 8.861718e+01 20.1437865        PC2    88.6171753
## 3   72.96469699 1.235880e+01 60.5417410 cg00962106    72.9646970
## 4   43.13240373 1.883119e+01 62.0277500 cg02225060    62.0277500
## 5   49.96445547 8.974834e+00 59.0034449 cg02981548    59.0034449
## 6   57.28594595 1.575519e+01 41.4665989 cg23432430    57.2859460
## 7   54.50276255 8.364596e+00 46.0740113 cg14710850    54.5027626
## 8   20.67722621 3.368145e+01 54.4228308 cg16749614    54.4228308
## 9   48.28784196 5.428685e+01  5.9348530 cg07152869    54.2868503
## 10  28.99955065 2.441458e+01 53.4782904 cg08857872    53.4782904
## 11  27.03411042 2.538099e+01 52.4792605 cg16652920    52.4792605
## 12  51.16151095 4.209271e+01  9.0046416 cg26948066    51.1615110
## 13  12.09810917 3.868799e+01 50.8502580        PC3    50.8502580
## 14  48.60398248 1.033556e+00 49.7016936 cg08861434    49.7016936
## 15  49.49760764 2.975942e+01 19.6740347 cg27452255    49.4976076
## 16  48.11260158 2.054659e+01 27.5018605 cg09584650    48.1126016
## 17  31.91327401 1.580218e+01 47.7796126 cg11133939    47.7796126
## 18  47.23884683 4.492017e+01  2.2545244 cg19503462    47.2388468
## 19  20.56964371 4.648222e+01 25.8484212 cg06864789    46.4822202
## 20  30.73782371 1.468467e+01 45.4866533 cg02372404    45.4866533
## 21  13.69354785 4.531509e+01 31.5573838 cg24859648    45.3150869
## 22  10.38234270 3.471979e+01 45.1662858 cg14527649    45.1662858
## 23  44.71198271 3.266239e+01 11.9854379 cg03982462    44.7119827
## 24  43.77730816 1.498709e+01 28.7260663 cg06536614    43.7773082
## 25   0.05883057 4.329808e+01 43.1750913 cg17186592    43.2980771
## 26  26.35356177 1.675128e+01 43.1689936 cg26219488    43.1689936
## 27  42.96184106 1.407974e+01 28.8179482 cg10240127    42.9618411
## 28  13.43728479 4.289995e+01 29.3985098 cg00247094    42.8999499
## 29  35.47454508 6.858665e+00 42.3973656 cg20685672    42.3973656
## 30   3.59352609 4.215343e+01 38.4957495 cg25259265    42.1534309
## 31  42.14101201 1.425672e+01 27.8201416 cg16715186    42.1410120
## 32   0.72471522 4.194004e+01 41.1511653 cg05096415    41.9400358
## 33  34.83387329 4.176063e+01  6.8626049 cg15775217    41.7606334
## 34  15.96588577 4.058679e+01 24.5567515 cg24861747    40.5867925
## 35  34.02582240 6.216602e+00 40.3065798 cg07028768    40.3065798
## 36   4.42923611 3.973188e+01 35.2384836 cg14924512    39.7318750
## 37  24.97857945 3.964211e+01 14.5993802 cg03084184    39.6421149
## 38   4.47364133 3.907237e+01 34.5345690 cg05570109    39.0723656
## 39  34.87838267 4.000343e+00 38.9428813 cg01921484    38.9428813
## 40   9.76109106 2.779419e+01 37.6194357 cg00154902    37.6194357
## 41  28.32371241 3.743846e+01  9.0505944 cg26757229    37.4384621
## 42  37.35409876 9.845548e+00 27.4443952 cg03660162    37.3540988
## 43  35.87862161 5.228450e-01 36.4656219 cg12228670    36.4656219
## 44   4.41997575 3.173978e+01 36.2239098 cg00616572    36.2239098
## 45  14.11765143 3.616344e+01 21.9816350 cg20507276    36.1634417
## 46   5.45749304 3.544622e+01 29.9245672 cg05841700    35.4462155
## 47  21.86529704 1.351351e+01 35.4429584 cg06715136    35.4429584
## 48  22.83396655 1.227262e+01 35.1707449 cg02621446    35.1707449
## 49  18.36248346 3.501828e+01 16.5916380 cg12738248    35.0182767
## 50  14.22710960 3.493641e+01 20.6451501 cg09854620    34.9364150
## 51  32.22376446 3.481855e+01  2.5306279 cg00322003    34.8185476
## 52   8.08384230 2.660522e+01 34.7532141 cg24873924    34.7532141
## 53  14.18047579 3.469812e+01 20.4534883 cg03129555    34.6981194
## 54  34.67696776 7.589721e+00 27.0230912 cg04412904    34.6769678
## 55  15.01146046 1.956788e+01 34.6434941 cg17738613    34.6434941
## 56  18.92284513 1.558633e+01 34.5733334 cg25879395    34.5733334
## 57  34.33931078 1.088425e+01 23.3909061 cg05234269    34.3393108
## 58  22.74938839 3.407273e+01 11.2591886 cg20913114    34.0727323
## 59   1.10552969 3.256819e+01 33.7378725 cg02494911    33.7378725
## 60  17.46538209 3.350897e+01 15.9794332 cg00675157    33.5089705
## 61  26.90358032 3.346119e+01  6.4934510 cg12279734    33.4611866
## 62  12.80983950 2.054797e+01 33.4219666 cg01153376    33.4219666
## 63  30.28967663 2.971189e+00 33.3250209 cg04248279    33.3250209
## 64  30.64200051 3.320655e+01  2.5003923 cg06697310    33.2065481
## 65  19.19843226 1.362877e+01 32.8913537 cg16771215    32.8913537
## 66  25.57003042 3.288938e+01  7.2551937 cg26474732    32.8893794
## 67   1.21314270 3.269567e+01 31.4183733 cg12534577    32.6956712
## 68  14.55313738 3.243791e+01 17.8206148 cg06378561    32.4379075
## 69  19.18973566 1.316038e+01 32.4142742 cg18819889    32.4142742
## 70  29.77270902 3.221745e+01  2.3805896 cg01013522    32.2174539
## 71   8.93772126 2.320996e+01 32.2118330 cg10369879    32.2118330
## 72  31.33577329 9.313653e+00 21.9579652 cg03327352    31.3357733
## 73  31.29967812 8.695863e+00 22.5396602 cg07138269    31.2996781
## 74  30.28028219 7.130515e-01 31.0574889 cg12146221    31.0574889
## 75  31.01600323 1.154261e+01 19.4092367 cg11227702    31.0160032
## 76  30.51131704 2.020539e-01 30.7775262 cg27577781    30.7775262
## 77  30.73303248 2.929545e+01  1.3734260 cg02356645    30.7330325
## 78  10.88695480 1.960658e+01 30.5576906 cg15865722    30.5576906
## 79  21.12814755 3.052680e+01  9.3344960 cg18339359    30.5267988
## 80  21.72224653 3.049841e+01  8.7120057 cg08584917    30.4984075
## 81  30.47938340 1.623212e+01 14.1831103 cg15535896    30.4793834
## 82   9.34688271 3.034956e+01 20.9385240 cg01680303    30.3495620
## 83   0.66029731 2.956642e+01 30.2908744 cg01667144    30.2908744
## 84  17.55953473 2.993701e+01 12.3133187 cg07523188    29.9370087
## 85  12.71944904 1.708317e+01 29.8667786 cg21854924    29.8667786
## 86   9.98858571 2.974028e+01 19.6875423 cg10750306    29.7402832
## 87   5.72424689 2.961588e+01 23.8274785 cg16579946    29.6158807
## 88  29.45167177 5.867809e+00 23.5197079 cg11438323    29.4516718
## 89   7.89481912 2.936063e+01 21.4016585 cg18821122    29.3606329
## 90  13.47025445 1.551555e+01 29.0499630 cg01128042    29.0499630
## 91  12.43865614 1.650719e+01 29.0100007 cg14564293    29.0100007
## 92  28.70024447 4.432688e-01 28.1928204 cg08198851    28.7002445
## 93  25.92001930 2.699439e+00 28.6836137 cg00696044    28.6836137
## 94  28.64274404 7.484468e+00 21.0941208 cg17421046    28.6427440
## 95  28.22281533 1.423058e+01 13.9280826 cg11331837    28.2228153
## 96   4.57949131 2.318121e+01 27.8248553 cg12682323    27.8248553
## 97  27.75178280 2.314445e+01  4.5431752 cg02932958    27.7517828
## 98   2.23018238 2.770392e+01 25.4095774 cg23658987    27.7039151
## 99  13.54125012 1.405997e+01 27.6653736 cg07480176    27.6653736
## 100 18.99135526 8.561151e+00 27.6166619 cg10738648    27.6166619
## 101 23.24171267 4.223920e+00 27.5297883 cg03071582    27.5297883
## 102 27.50633915 1.371731e+01 13.7248754 cg25758034    27.5063392
## 103  8.31603304 1.850390e+01 26.8840917 cg06118351    26.8840917
## 104 26.47257188 2.668221e+01  0.1454858 cg19512141    26.6822130
## 105 15.77003229 2.662199e+01 10.7878011 cg23161429    26.6219887
## 106 13.97970860 2.639323e+01 12.3493705 cg11247378    26.3932344
## 107 18.58873769 7.684422e+00 26.3373146 cg20678988    26.3373146
## 108 14.36630731 1.154461e+01 25.9750683 cg27086157    25.9750683
## 109 25.84323391 9.775700e+00 16.0033784 cg03088219    25.8432339
## 110 13.62723686 2.527421e+01 11.5828134 cg22274273    25.2742056
## 111  2.73030181 2.236070e+01 25.1551567 cg13885788    25.1551567
## 112  7.96947771 1.668187e+01 24.7155013 cg14240646    24.7155013
## 113 23.64664608 7.870673e-01 24.4978687 cg06112204    24.4978687
## 114 24.37581284 4.909569e+00 19.4020885 cg17429539    24.3758128
## 115 23.05210026 2.435068e+01  1.2344261 cg25561557    24.3506817
## 116 21.11642846 3.134820e+00 24.3154035 cg14293999    24.3154035
## 117 15.52350094 8.639394e+00 24.2270504 cg19377607    24.2270504
## 118 21.13941824 2.411177e+01  2.9081939 cg06950937    24.1117674
## 119 24.09416705 4.091480e+00 19.9385319 cg25436480    24.0941671
## 120 14.61625621 9.014240e+00 23.6946512 cg00272795    23.6946512
## 121 10.00492471 1.338641e+01 23.4554942 cg12012426    23.4554942
## 122 23.38047442 1.718219e+01  6.1341289 cg05321907    23.3804744
## 123 23.15224963 9.972486e+00 13.1156084 cg20139683    23.1522496
## 124  0.72026425 2.312477e+01 22.3403456 cg26069044    23.1247652
## 125 21.02229411 2.241430e+01  1.3278555 cg23916408    22.4143048
## 126  0.60344482 2.222826e+01 21.5606612 cg27341708    22.2282613
## 127 15.96951803 2.220693e+01  6.1732548 cg13080267    22.2069281
## 128 21.85904275 1.296775e+00 20.4981121 cg27272246    21.8590428
## 129  0.95551426 2.184089e+01 20.8212223 cg12284872    21.8408918
## 130  2.40801550 2.169902e+01 19.2268469 cg00689685    21.6990177
## 131  2.00882816 2.152514e+01 19.4521562 cg16178271    21.5251397
## 132 21.27669027 8.123896e+00 13.0886385 cg21209485    21.2766903
## 133 20.58800214 1.058955e+01  9.9342921 cg24851651    20.5880021
## 134 20.33445521 7.327227e+00 12.9430731 cg21697769    20.3344552
## 135 20.32764749 6.212758e+00 14.0507346 cg04664583    20.3276475
## 136 14.63879152 1.993277e+01  5.2298202 cg00999469    19.9327670
## 137  2.26826077 1.742733e+01 19.7597458 cg20370184    19.7597458
## 138 18.98018869 4.183448e+00 14.7325852 cg11187460    18.9801887
## 139 18.43492519 1.997626e+00 16.3731437 cg12784167    18.4349252
## 140  1.20049655 1.698226e+01 18.2469088 cg02320265    18.2469088
## 141 17.49071207 1.357646e+01  3.8500929 cg12776173    17.4907121
## 142 17.27595583 1.270576e+00 15.9412248 cg08779649    17.2759558
## 143  8.18137127 8.987929e+00 17.2334557 cg01933473    17.2334557
## 144 17.18292416 8.949228e+00  8.1695414 cg15501526    17.1829242
## 145 13.77316373 1.693296e+01  3.0956441 cg10985055    16.9329631
## 146 16.16347361 6.749371e+00  9.3499476 cg17906851    16.1634736
## 147 11.29815350 4.706969e+00 16.0692781 cg14307563    16.0692781
## 148  4.33181096 1.431018e+01  9.9142134 cg16788319    14.3101796
## 149 11.34767824 1.384098e+01  2.4291435 cg24506579    13.8409770
## 150  9.52173130 1.242049e+01  2.8346083 cg27639199    12.4204949
## 151  1.91285375 1.029383e+01 12.2708378 cg12466610    12.2708378
## 152  9.00293247 2.188499e+00 11.2555867 cg15633912    11.2555867
## 153  0.00000000 1.116694e+01 11.2310930 cg01413796    11.2310930
## 154  1.45721713 1.876081e-01  1.7089805 cg01549082     1.7089805
## 155  0.70664419 5.164989e-03  0.7759644    age.now     0.7759644
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_elastic_net_model1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.

if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_elastic_net_model1_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_elastic_net_model1_df,n=20)$Feature)
  
  importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_elastic_net_model1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
##          CN   Dementia       MCI    Feature MaxImportance
## 1  86.61857 100.000000 13.317276        PC1     100.00000
## 2  68.40923  88.617175 20.143786        PC2      88.61718
## 3  72.96470  12.358801 60.541741 cg00962106      72.96470
## 4  43.13240  18.831191 62.027750 cg02225060      62.02775
## 5  49.96446   8.974834 59.003445 cg02981548      59.00344
## 6  57.28595  15.755192 41.466599 cg23432430      57.28595
## 7  54.50276   8.364596 46.074011 cg14710850      54.50276
## 8  20.67723  33.681449 54.422831 cg16749614      54.42283
## 9  48.28784  54.286850  5.934853 cg07152869      54.28685
## 10 28.99955  24.414585 53.478290 cg08857872      53.47829
## 11 27.03411  25.380995 52.479260 cg16652920      52.47926
## 12 51.16151  42.092714  9.004642 cg26948066      51.16151
## 13 12.09811  38.687994 50.850258        PC3      50.85026
## 14 48.60398   1.033556 49.701694 cg08861434      49.70169
## 15 49.49761  29.759418 19.674035 cg27452255      49.49761
## 16 48.11260  20.546586 27.501861 cg09584650      48.11260
## 17 31.91327  15.802183 47.779613 cg11133939      47.77961
## 18 47.23885  44.920167  2.254524 cg19503462      47.23885
## 19 20.56964  46.482220 25.848421 cg06864789      46.48222
## 20 30.73782  14.684674 45.486653 cg02372404      45.48665
## [1] "the top 20 features based on max way:"
##  [1] "PC1"        "PC2"        "cg00962106" "cg02225060" "cg02981548" "cg23432430" "cg14710850"
##  [8] "cg16749614" "cg07152869" "cg08857872" "cg16652920" "cg26948066" "PC3"        "cg08861434"
## [15] "cg27452255" "cg09584650" "cg11133939" "cg19503462" "cg06864789" "cg02372404"
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.

if(METHOD_FEATURE_FLAG == 5){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_ENM1_AUC <- auc_value
  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_ENM1_AUC <- auc_value
  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 3){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_ENM1_AUC <- auc_value
  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if (METHOD_FEATURE_FLAG ==1){
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.8682
## The AUC value for class CN is: 0.8681699 
## 
## Class: Dementia 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.8656
## The AUC value for class Dementia is: 0.8655844 
## 
## Class: MCI 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.8361
## The AUC value for class MCI is: 0.8361272

if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Median_ENM1_AUC <- mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.8566272
print(FeatEval_Median_ENM1_AUC)
## [1] 0.8566272

9.2.4. XGBoost

9.2.4.1 XGBoost Model Training

library(caret)
library(xgboost)
library(dplyr)
library(doParallel)
numCores <- detectCores() - 1
c2 <- makeCluster(numCores)
registerDoParallel(c2)
df_XGB1<-processed_data 
featureName_XGB1<-AfterProcess_FeatureName
set.seed(123)
trainIndex <- createDataPartition(df_XGB1$DX, p = 0.7, list = FALSE)
trainData_XGB1<- df_XGB1[trainIndex, ]
testData_XGB1 <- df_XGB1[-trainIndex, ]
cv_control <- trainControl(method = "cv", number = 5, allowParallel = TRUE)

xgb_model <- caret::train(
  DX ~ ., data = trainData_XGB1,
  method = "xgbTree", trControl = cv_control,
  metric = "Accuracy"
)

print(xgb_model)
## eXtreme Gradient Boosting 
## 
## 455 samples
## 155 predictors
##   3 classes: 'CN', 'Dementia', 'MCI' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 364, 365, 363, 364, 364 
## Resampling results across tuning parameters:
## 
##   eta  max_depth  colsample_bytree  subsample  nrounds  Accuracy   Kappa    
##   0.3  1          0.6               0.50        50      0.5736062  0.2133427
##   0.3  1          0.6               0.50       100      0.5582439  0.2085935
##   0.3  1          0.6               0.50       150      0.5625917  0.2244108
##   0.3  1          0.6               0.75        50      0.5407347  0.1472015
##   0.3  1          0.6               0.75       100      0.5540176  0.1792182
##   0.3  1          0.6               0.75       150      0.5628333  0.2063006
##   0.3  1          0.6               1.00        50      0.5231746  0.1106318
##   0.3  1          0.6               1.00       100      0.5474232  0.1678977
##   0.3  1          0.6               1.00       150      0.5606339  0.2029962
##   0.3  1          0.8               0.50        50      0.5890880  0.2424088
##   0.3  1          0.8               0.50       100      0.5934124  0.2672959
##   0.3  1          0.8               0.50       150      0.5868408  0.2642353
##   0.3  1          0.8               0.75        50      0.5430514  0.1555119
##   0.3  1          0.8               0.75       100      0.5406843  0.1665257
##   0.3  1          0.8               0.75       150      0.5605617  0.2088901
##   0.3  1          0.8               1.00        50      0.5342358  0.1288615
##   0.3  1          0.8               1.00       100      0.5407804  0.1556951
##   0.3  1          0.8               1.00       150      0.5584116  0.2014657
##   0.3  2          0.6               0.50        50      0.5538228  0.1995945
##   0.3  2          0.6               0.50       100      0.5692796  0.2243763
##   0.3  2          0.6               0.50       150      0.5736752  0.2365566
##   0.3  2          0.6               0.75        50      0.5494283  0.1713509
##   0.3  2          0.6               0.75       100      0.5736768  0.2237137
##   0.3  2          0.6               0.75       150      0.5604895  0.2040200
##   0.3  2          0.6               1.00        50      0.5385587  0.1501599
##   0.3  2          0.6               1.00       100      0.5516255  0.1817362
##   0.3  2          0.6               1.00       150      0.5737007  0.2208991
##   0.3  2          0.8               0.50        50      0.5625195  0.2024290
##   0.3  2          0.8               0.50       100      0.5779763  0.2363713
##   0.3  2          0.8               0.50       150      0.5824197  0.2476196
##   0.3  2          0.8               0.75        50      0.5867920  0.2509766
##   0.3  2          0.8               0.75       100      0.5934342  0.2573927
##   0.3  2          0.8               0.75       150      0.5846902  0.2431550
##   0.3  2          0.8               1.00        50      0.5451999  0.1662079
##   0.3  2          0.8               1.00       100      0.5627834  0.1963118
##   0.3  2          0.8               1.00       150      0.5539683  0.1831114
##   0.3  3          0.6               0.50        50      0.5649350  0.1945959
##   0.3  3          0.6               0.50       100      0.5713351  0.2177009
##   0.3  3          0.6               0.50       150      0.5846435  0.2435426
##   0.3  3          0.6               0.75        50      0.5582439  0.1912471
##   0.3  3          0.6               0.75       100      0.5693056  0.2148966
##   0.3  3          0.6               0.75       150      0.5627849  0.2058365
##   0.3  3          0.6               1.00        50      0.5538971  0.1741107
##   0.3  3          0.6               1.00       100      0.5626889  0.1948407
##   0.3  3          0.6               1.00       150      0.5583416  0.1924154
##   0.3  3          0.8               0.50        50      0.5781712  0.2266612
##   0.3  3          0.8               0.50       100      0.5627117  0.2093377
##   0.3  3          0.8               0.50       150      0.5671322  0.2231162
##   0.3  3          0.8               0.75        50      0.5648134  0.1892441
##   0.3  3          0.8               0.75       100      0.5889659  0.2431679
##   0.3  3          0.8               0.75       150      0.5802469  0.2286026
##   0.3  3          0.8               1.00        50      0.5671567  0.2005874
##   0.3  3          0.8               1.00       100      0.5671567  0.2028428
##   0.3  3          0.8               1.00       150      0.5781462  0.2285191
##   0.4  1          0.6               0.50        50      0.5428810  0.1759217
##   0.4  1          0.6               0.50       100      0.5472050  0.1993525
##   0.4  1          0.6               0.50       150      0.5560694  0.2141393
##   0.4  1          0.6               0.75        50      0.5342602  0.1478947
##   0.4  1          0.6               0.75       100      0.5870813  0.2513222
##   0.4  1          0.6               0.75       150      0.5828056  0.2515964
##   0.4  1          0.6               1.00        50      0.5386797  0.1427571
##   0.4  1          0.6               1.00       100      0.5605850  0.2101882
##   0.4  1          0.6               1.00       150      0.5583384  0.2008624
##   0.4  1          0.8               0.50        50      0.5561193  0.1990791
##   0.4  1          0.8               0.50       100      0.5493794  0.1975600
##   0.4  1          0.8               0.50       150      0.5537978  0.2081486
##   0.4  1          0.8               0.75        50      0.5473759  0.1714033
##   0.4  1          0.8               0.75       100      0.5518182  0.1871054
##   0.4  1          0.8               0.75       150      0.5759229  0.2413895
##   0.4  1          0.8               1.00        50      0.5539927  0.1727786
##   0.4  1          0.8               1.00       100      0.5650295  0.2081764
##   0.4  1          0.8               1.00       150      0.5627834  0.2115588
##   0.4  2          0.6               0.50        50      0.5736529  0.2303916
##   0.4  2          0.6               0.50       100      0.5670829  0.2220532
##   0.4  2          0.6               0.50       150      0.5758024  0.2386951
##   0.4  2          0.6               0.75        50      0.5584361  0.1926618
##   0.4  2          0.6               0.75       100      0.5761661  0.2324351
##   0.4  2          0.6               0.75       150      0.5716733  0.2229305
##   0.4  2          0.6               1.00        50      0.5516494  0.1728691
##   0.4  2          0.6               1.00       100      0.5736046  0.2204737
##   0.4  2          0.6               1.00       150      0.5802707  0.2332756
##   0.4  2          0.8               0.50        50      0.5562388  0.1972760
##   0.4  2          0.8               0.50       100      0.5539199  0.1963172
##   0.4  2          0.8               0.50       150      0.5583394  0.2088900
##   0.4  2          0.8               0.75        50      0.5452015  0.1706634
##   0.4  2          0.8               0.75       100      0.5649090  0.2141284
##   0.4  2          0.8               0.75       150      0.5672034  0.2176628
##   0.4  2          0.8               1.00        50      0.5671561  0.2076591
##   0.4  2          0.8               1.00       100      0.5824685  0.2350920
##   0.4  2          0.8               1.00       150      0.5845941  0.2462171
##   0.4  3          0.6               0.50        50      0.5823719  0.2396678
##   0.4  3          0.6               0.50       100      0.6021776  0.2854283
##   0.4  3          0.6               0.50       150      0.5735563  0.2354364
##   0.4  3          0.6               0.75        50      0.5848123  0.2435805
##   0.4  3          0.6               0.75       100      0.5760195  0.2307819
##   0.4  3          0.6               0.75       150      0.5780968  0.2322680
##   0.4  3          0.6               1.00        50      0.5868174  0.2402022
##   0.4  3          0.6               1.00       100      0.5890625  0.2540586
##   0.4  3          0.6               1.00       150      0.5869125  0.2498853
##   0.4  3          0.8               0.50        50      0.5627605  0.2116175
##   0.4  3          0.8               0.50       100      0.5891591  0.2611647
##   0.4  3          0.8               0.50       150      0.5869613  0.2567942
##   0.4  3          0.8               0.75        50      0.5868142  0.2422454
##   0.4  3          0.8               0.75       100      0.5889643  0.2511938
##   0.4  3          0.8               0.75       150      0.5889887  0.2505491
##   0.4  3          0.8               1.00        50      0.5650804  0.1936825
##   0.4  3          0.8               1.00       100      0.5627377  0.1949545
##   0.4  3          0.8               1.00       150      0.5715294  0.2145625
## 
## Tuning parameter 'gamma' was held constant at a value of 0
## Tuning parameter
##  'min_child_weight' was held constant at a value of 1
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were nrounds = 100, max_depth = 3, eta = 0.4, gamma =
##  0, colsample_bytree = 0.6, min_child_weight = 1 and subsample = 0.5.
mean_accuracy_xgb_model<- mean(xgb_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_xgb_model)
## [1] 0.5662996
FeatEval_Median_mean_accuracy_cv_xgb<-mean_accuracy_xgb_model
print(FeatEval_Median_mean_accuracy_cv_xgb)
## [1] 0.5662996
train_predictions <- predict(xgb_model, newdata = trainData_XGB1, type = "raw")

train_accuracy <- mean(train_predictions == trainData_XGB1$DX)
FeatEval_Median_xgb_trainAccuracy <- train_accuracy

print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  1"
print(FeatEval_Median_xgb_trainAccuracy)
## [1] 1
predictions <- predict(xgb_model, newdata = testData_XGB1)
cm_FeatEval_Median_xgb <-caret::confusionMatrix(predictions,testData_XGB1$DX)
print(cm_FeatEval_Median_xgb)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia MCI
##   CN       33        8  15
##   Dementia  3        6   2
##   MCI      30       14  82
## 
## Overall Statistics
##                                           
##                Accuracy : 0.6269          
##                  95% CI : (0.5546, 0.6953)
##     No Information Rate : 0.513           
##     P-Value [Acc > NIR] : 0.0009246       
##                                           
##                   Kappa : 0.331           
##                                           
##  Mcnemar's Test P-Value : 0.0009969       
## 
## Statistics by Class:
## 
##                      Class: CN Class: Dementia Class: MCI
## Sensitivity             0.5000         0.21429     0.8283
## Specificity             0.8189         0.96970     0.5319
## Pos Pred Value          0.5893         0.54545     0.6508
## Neg Pred Value          0.7591         0.87912     0.7463
## Prevalence              0.3420         0.14508     0.5130
## Detection Rate          0.1710         0.03109     0.4249
## Detection Prevalence    0.2902         0.05699     0.6528
## Balanced Accuracy       0.6594         0.59199     0.6801
cm_FeatEval_Median_xgb_Accuracy <-cm_FeatEval_Median_xgb$overall["Accuracy"]
cm_FeatEval_Median_xgb_Kappa <-cm_FeatEval_Median_xgb$overall["Kappa"]

print(cm_FeatEval_Median_xgb_Accuracy)
## Accuracy 
## 0.626943
print(cm_FeatEval_Median_xgb_Kappa)
##     Kappa 
## 0.3309903
importance_xgb_model<- varImp(xgb_model)

print(importance_xgb_model)
## xgbTree variable importance
## 
##   only 20 most important variables shown (out of 155)
## 
##            Overall
## age.now     100.00
## cg15501526   91.95
## cg16771215   86.45
## cg05234269   86.13
## cg25259265   75.79
## cg01921484   68.79
## cg03088219   68.21
## cg02981548   67.80
## cg00962106   66.03
## cg16652920   65.81
## cg01667144   63.79
## cg08857872   62.34
## cg07152869   61.76
## cg26948066   60.94
## cg01153376   59.72
## cg00154902   59.63
## cg10369879   59.32
## cg03084184   59.12
## cg18821122   56.96
## cg06864789   56.17
plot(importance_xgb_model, top = 20, main = "Variable Importance Plot")

importance_xgb_model_df<-importance_xgb_model$importance
importance <- xgb.importance(model = xgb_model$finalModel)
xgb.plot.importance(importance_matrix = importance)

ordered_importance <- importance[order(-importance$Importance), ]
print(ordered_importance)
##         Feature         Gain        Cover    Frequency   Importance
##          <char>        <num>        <num>        <num>        <num>
##   1:    age.now 2.112743e-02 0.0185555881 0.0107066381 2.112743e-02
##   2: cg15501526 1.943013e-02 0.0172464912 0.0092790864 1.943013e-02
##   3: cg16771215 1.827025e-02 0.0148549078 0.0114204140 1.827025e-02
##   4: cg05234269 1.820234e-02 0.0133119992 0.0071377587 1.820234e-02
##   5: cg25259265 1.602252e-02 0.0112318961 0.0064239829 1.602252e-02
##  ---                                                               
## 151: cg06112204 5.356297e-04 0.0008945313 0.0014275517 5.356297e-04
## 152: cg20370184 3.587610e-04 0.0006866937 0.0028551035 3.587610e-04
## 153: cg03071582 2.521750e-04 0.0009505033 0.0021413276 2.521750e-04
## 154:        PC2 2.233942e-04 0.0005041107 0.0014275517 2.233942e-04
## 155: cg12466610 3.889293e-05 0.0001385521 0.0007137759 3.889293e-05
stopCluster(c2)
registerDoSEQ()
if(METHOD_FEATURE_FLAG == 5){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_xgb_AUC <-auc_value
  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_xgb_AUC <-auc_value
  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_xgb_AUC <-auc_value
  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.7048
## The AUC value for class CN is: 0.7048437 
## 
## Class: Dementia 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.69
## The AUC value for class Dementia is: 0.6900433 
## 
## Class: MCI 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.6976
## The AUC value for class MCI is: 0.6976144

if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Median_xgb_AUC <-mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.6975005
print(FeatEval_Median_xgb_AUC)
## [1] 0.6975005

9.2.5. Random Forest

9.2.5.1 Random Forest Model Training

library(caret)
library(randomForest)
df_RFM1<-processed_data 
featureName_RFM1<-AfterProcess_FeatureName
library(randomForest)

set.seed(123) 
trainIndex <- createDataPartition(df_RFM1$DX, p = 0.7, list = FALSE)
train_data_RFM1 <- df_RFM1[trainIndex, ]
test_data_RFM1 <- df_RFM1[-trainIndex, ]

X_train_RFM1 <- subset(train_data_RFM1, select = -DX)
y_train_RFM1 <- train_data_RFM1$DX
X_train_RFM1 <- subset(test_data_RFM1, select = -DX)
y_test_RFM1 <- test_data_RFM1$DX
ctrl <- trainControl(method = "cv", number = 5, classProbs = TRUE)

rf_model <- caret::train(
  DX ~ ., data = train_data_RFM1,
  method = "rf", trControl = ctrl,
  metric = "Accuracy",
  importance = TRUE
)

print(rf_model)
## Random Forest 
## 
## 455 samples
## 155 predictors
##   3 classes: 'CN', 'Dementia', 'MCI' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 364, 365, 363, 364, 364 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa     
##     2   0.5209529  0.02147335
##    78   0.5516499  0.12431050
##   155   0.5560227  0.12959840
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 155.
mean_accuracy_rf_model<- mean(rf_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_rf_model)
## [1] 0.5428752
FeatEval_Median_mean_accuracy_cv_rf<-mean_accuracy_rf_model
print(FeatEval_Median_mean_accuracy_cv_rf)
## [1] 0.5428752
train_predictions <- predict(rf_model, newdata = train_data_RFM1, type = "raw")

train_accuracy <- mean(train_predictions == train_data_RFM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  1"
FeatEval_Median_rf_trainAccuracy<-train_accuracy
print(FeatEval_Median_rf_trainAccuracy)
## [1] 1
predictions <- predict(rf_model, newdata = test_data_RFM1)
cm_FeatEval_Median_rf<-caret::confusionMatrix(predictions,test_data_RFM1$DX)
print(cm_FeatEval_Median_rf)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia MCI
##   CN       19        7  12
##   Dementia  0        0   0
##   MCI      47       21  87
## 
## Overall Statistics
##                                           
##                Accuracy : 0.5492          
##                  95% CI : (0.4761, 0.6208)
##     No Information Rate : 0.513           
##     P-Value [Acc > NIR] : 0.1747          
##                                           
##                   Kappa : 0.1343          
##                                           
##  Mcnemar's Test P-Value : 1.465e-10       
## 
## Statistics by Class:
## 
##                      Class: CN Class: Dementia Class: MCI
## Sensitivity            0.28788          0.0000     0.8788
## Specificity            0.85039          1.0000     0.2766
## Pos Pred Value         0.50000             NaN     0.5613
## Neg Pred Value         0.69677          0.8549     0.6842
## Prevalence             0.34197          0.1451     0.5130
## Detection Rate         0.09845          0.0000     0.4508
## Detection Prevalence   0.19689          0.0000     0.8031
## Balanced Accuracy      0.56914          0.5000     0.5777
cm_FeatEval_Median_rf_Accuracy<-cm_FeatEval_Median_rf$overall["Accuracy"]
print(cm_FeatEval_Median_rf_Accuracy)
##  Accuracy 
## 0.5492228
cm_FeatEval_Median_rf_Kappa<-cm_FeatEval_Median_rf$overall["Kappa"]
print(cm_FeatEval_Median_rf_Kappa)
##    Kappa 
## 0.134306
importance_rf_model <- varImp(rf_model)
print(importance_rf_model)
## rf variable importance
## 
##   variables are sorted by maximum importance across the classes
##   only 20 most important variables shown (out of 155)
## 
##               CN Dementia    MCI
## cg15501526 76.69    35.05 100.00
## age.now    47.72    45.85  76.28
## cg01153376 25.95    44.39  66.76
## cg06864789 30.44    62.72  14.64
## cg25259265 21.71    53.34  44.55
## cg12279734 53.34    46.14  13.15
## cg00962106 31.82    26.72  47.89
## cg15775217 47.85    37.29  23.55
## cg00247094 13.93    46.90  30.08
## cg09584650 29.42    46.70  31.56
## cg20685672 44.88    21.77  22.14
## cg07028768 24.85    13.81  44.41
## cg14564293 43.75    37.11  41.38
## cg05096415 29.33    42.98  28.25
## cg20507276 20.97    42.88  35.73
## cg16652920 23.84    12.07  42.66
## cg01128042 29.81    15.40  42.37
## cg05234269 29.99    41.57  36.94
## cg01667144 28.92    28.13  41.52
## cg26069044 20.75    31.78  41.44
plot(importance_rf_model, top = 20, main = "Variable Importance Plot")

importance_rf_model_df<-importance_rf_model$importance
if(METHOD_FEATURE_FLAG==5 ){
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(MCI))

print(Ordered_importance_rf_final_model)
  
}
if(METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==6 ){
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(Dementia))

print(Ordered_importance_rf_final_model)
  
}
if(METHOD_FEATURE_FLAG==3 ){
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(CI))

print(Ordered_importance_rf_final_model)
  
}
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_rf_model_df$Feature<-rownames(importance_rf_model_df)
  importance_rf_model_df <- importance_rf_model_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_rf_model_df)
  
}
##             CN  Dementia        MCI    Feature MaxImportance
## 1   76.6923099 35.053973 100.000000 cg15501526     100.00000
## 2   47.7155528 45.846153  76.278518    age.now      76.27852
## 3   25.9524160 44.393868  66.757310 cg01153376      66.75731
## 4   30.4368197 62.721527  14.644304 cg06864789      62.72153
## 5   21.7118768 53.342304  44.546949 cg25259265      53.34230
## 6   53.3366214 46.142628  13.150659 cg12279734      53.33662
## 7   31.8202922 26.717263  47.889227 cg00962106      47.88923
## 8   47.8486824 37.288439  23.548381 cg15775217      47.84868
## 9   13.9339856 46.900622  30.076314 cg00247094      46.90062
## 10  29.4219740 46.695776  31.560213 cg09584650      46.69578
## 11  44.8826960 21.768550  22.141293 cg20685672      44.88270
## 12  24.8468309 13.809502  44.406037 cg07028768      44.40604
## 13  43.7453305 37.110377  41.381288 cg14564293      43.74533
## 14  29.3320840 42.982247  28.245622 cg05096415      42.98225
## 15  20.9688504 42.882887  35.730460 cg20507276      42.88289
## 16  23.8446279 12.073446  42.661131 cg16652920      42.66113
## 17  29.8068016 15.399429  42.366636 cg01128042      42.36664
## 18  29.9885559 41.570634  36.937020 cg05234269      41.57063
## 19  28.9233442 28.131882  41.518101 cg01667144      41.51810
## 20  20.7459853 31.776151  41.443458 cg26069044      41.44346
## 21  27.6293426 40.871386  15.045489 cg01013522      40.87139
## 22  40.6837482 31.142879  24.998680 cg10240127      40.68375
## 23  14.7615919 40.620971  17.390796 cg09854620      40.62097
## 24  21.4395947 21.298022  40.483537 cg14240646      40.48354
## 25  40.4585390 28.565076  31.726655 cg11247378      40.45854
## 26  35.2263588 25.221218  40.430931 cg08857872      40.43093
## 27  29.5806004 21.639614  40.240150 cg17429539      40.24015
## 28  39.7359421 28.389481  31.332308 cg12776173      39.73594
## 29  14.5771133 26.625146  39.487658 cg14293999      39.48766
## 30  39.4502276 24.111916  29.108735 cg14710850      39.45023
## 31   7.0626094 39.019639  15.476174 cg16749614      39.01964
## 32  29.6583517 24.416012  38.897490 cg17738613      38.89749
## 33  27.9399920 38.653751  25.501184 cg07138269      38.65375
## 34  32.7092510  9.886174  38.523214 cg11187460      38.52321
## 35  38.4024400 32.177953  27.004240 cg10369879      38.40244
## 36  20.5454312 38.172773  38.261766 cg00154902      38.26177
## 37  13.8058741 38.207466  23.172401 cg19503462      38.20747
## 38  38.0187628 34.050182  17.865403 cg12228670      38.01876
## 39  37.2993793 35.104079  20.820271 cg01413796      37.29938
## 40  37.2767231 35.343056  29.001598 cg24861747      37.27672
## 41  20.5617528 37.263527  22.032914 cg04664583      37.26353
## 42  36.7092256 37.247823  11.616416 cg18819889      37.24782
## 43  26.2452981 25.114644  37.174705 cg26948066      37.17471
## 44  24.7888263 14.067769  37.079181 cg25879395      37.07918
## 45  27.9642639 36.766799  21.481868 cg25561557      36.76680
## 46  35.1299664 24.157696  36.742176 cg01921484      36.74218
## 47  31.9929330 27.438995  36.569860 cg02981548      36.56986
## 48  36.5546142 26.163146  33.587987 cg02225060      36.55461
## 49  12.1852932 18.781119  36.230248 cg15865722      36.23025
## 50  17.5108449 36.223769  22.633305 cg20913114      36.22377
## 51  11.8445071 36.205398  27.078280 cg12012426      36.20540
## 52  32.3957712 36.159733  29.995395 cg03084184      36.15973
## 53  32.6369694 15.770925  36.080998 cg10738648      36.08100
## 54  24.5945447 35.678602  35.999158 cg08861434      35.99916
## 55  35.9235758 22.004695  10.914928 cg03982462      35.92358
## 56  35.7889401 24.567638  24.186577 cg12146221      35.78894
## 57  26.9848456 35.383394  35.773419 cg27086157      35.77342
## 58  28.3767333 35.737187  32.778373 cg23161429      35.73719
## 59  35.5530586 28.523193  24.332180 cg05321907      35.55306
## 60  24.9052663 26.844921  35.358064 cg02621446      35.35806
## 61  22.8174056 31.668731  35.120230        PC1      35.12023
## 62  10.2440923 35.100606  26.696785 cg24873924      35.10061
## 63  10.1390641 24.645613  35.063982 cg02320265      35.06398
## 64  12.2961094 30.229128  35.017946 cg04248279      35.01795
## 65  19.1118773 34.658779  22.591672 cg00675157      34.65878
## 66  28.0565230 34.535561   0.000000 cg18339359      34.53556
## 67  24.3162728 34.507079  24.207233 cg00999469      34.50708
## 68  25.3401062 34.504523  23.477398 cg12534577      34.50452
## 69  34.4680648 19.685799  30.279707        PC2      34.46806
## 70  32.1984313 20.003301  34.347849 cg04412904      34.34785
## 71  34.2379960 33.891448  21.134538 cg02372404      34.23800
## 72   9.5926522 33.670583  19.463676 cg26474732      33.67058
## 73  29.7836802 24.996509  33.513906 cg15535896      33.51391
## 74  33.5021361 25.168032  21.381426 cg20139683      33.50214
## 75  33.4553005 23.403492  23.752550 cg13885788      33.45530
## 76  25.7402623 15.661728  33.435992 cg00696044      33.43599
## 77  24.5527336 24.506876  33.301503 cg18821122      33.30150
## 78  12.9207342 31.629984  33.293584 cg16771215      33.29358
## 79  19.3733484 27.138709  33.027811 cg17186592      33.02781
## 80  21.6900173 24.195451  32.649723 cg12738248      32.64972
## 81  24.4534514 26.529520  32.649513 cg23658987      32.64951
## 82  19.0738941 32.490892  18.472594 cg22274273      32.49089
## 83  15.9237341 14.289192  32.343544 cg25758034      32.34354
## 84  22.2692762 27.425613  31.747802 cg03327352      31.74780
## 85  26.6026298 18.823974  31.730728 cg11133939      31.73073
## 86  25.2898340 31.659280  23.256775 cg21209485      31.65928
## 87  28.2206526 31.654303  15.631104 cg02356645      31.65430
## 88  25.2545537 31.392922  17.764957 cg06536614      31.39292
## 89  19.4441777 25.193139  31.195570 cg24851651      31.19557
## 90  11.1124388 31.194517  26.448398 cg17906851      31.19452
## 91   6.8616647 27.121857  31.152981 cg06112204      31.15298
## 92  23.4993443 31.125283  19.681406 cg14527649      31.12528
## 93  22.9552695 30.421863  20.186686 cg26219488      30.42186
## 94   5.7595342 30.332789  23.060533 cg12284872      30.33279
## 95  18.2063080 30.067965  28.767916 cg03088219      30.06797
## 96  17.3967089 22.829945  30.016690 cg02494911      30.01669
## 97  12.7485088 29.958409  26.526103 cg12682323      29.95841
## 98  29.7889960 28.863462  27.932902 cg03071582      29.78900
## 99  26.4068755 29.568590  28.378530 cg10985055      29.56859
## 100  3.2101704 29.204393  14.095890 cg03129555      29.20439
## 101 29.0200111 15.549608  20.682407 cg00616572      29.02001
## 102 24.5139331 28.841978  28.902186 cg27341708      28.90219
## 103 19.8888388  7.102368  28.754540 cg15633912      28.75454
## 104 20.4842109 13.293009  28.568325 cg02932958      28.56832
## 105 28.4052159  8.971811   6.016374 cg06378561      28.40522
## 106 21.9280701 22.629981  28.311482 cg19377607      28.31148
## 107 17.1581799 28.298635  11.501730 cg11227702      28.29864
## 108 28.2398654 27.256889  19.929494 cg01680303      28.23987
## 109 27.9819786 12.035818  22.682467 cg06118351      27.98198
## 110 19.0419535 26.725237  27.738234 cg08198851      27.73823
## 111 13.7901367 26.104324  27.732541        PC3      27.73254
## 112 25.6708255 14.569499  27.706092 cg16715186      27.70609
## 113 14.5779068 27.692561   3.153006 cg00272795      27.69256
## 114 16.3139465  7.701157  27.652643 cg12784167      27.65264
## 115  0.8957984 27.592997  23.884488 cg14924512      27.59300
## 116 16.2600518 27.526059  26.359399 cg20678988      27.52606
## 117 22.2284073 27.513187  23.300478 cg23916408      27.51319
## 118 11.5655941 20.028505  27.458140 cg08779649      27.45814
## 119 27.2848766 15.314137  23.359170 cg07152869      27.28488
## 120 27.1149265 26.976261  19.540706 cg03660162      27.11493
## 121 19.7029012 22.955827  27.105937 cg19512141      27.10594
## 122 21.4209614 14.788613  27.004950 cg26757229      27.00495
## 123 22.0683430  7.923935  26.973580 cg00689685      26.97358
## 124 20.7773065 26.771188  25.171435 cg06950937      26.77119
## 125 20.7988609 26.347107  18.386969 cg06715136      26.34711
## 126 14.6761704 14.707912  26.314149 cg27272246      26.31415
## 127  7.6524253 26.276744  18.646134 cg24859648      26.27674
## 128 20.0815165 19.090487  26.217279 cg01933473      26.21728
## 129 15.8324823 16.951754  25.930281 cg21697769      25.93028
## 130 11.8201937 25.891902  23.329712 cg10750306      25.89190
## 131 25.7809473 23.550684  25.216088 cg23432430      25.78095
## 132 16.6259933 25.748940  25.747717 cg07523188      25.74894
## 133  9.0579625 25.661118  22.882843 cg07480176      25.66112
## 134 25.4265285 11.601544  13.411877 cg06697310      25.42653
## 135 21.6110435 25.155535  25.060500 cg27577781      25.15554
## 136 24.5023451 21.019052  16.860391 cg20370184      24.50235
## 137 10.1681090 14.831455  24.502091 cg16788319      24.50209
## 138 16.5321968 24.469138  12.302947 cg08584917      24.46914
## 139 24.3277743 11.685925  20.143261 cg05570109      24.32777
## 140 23.2146100 18.394328   7.510325 cg27452255      23.21461
## 141 21.2926607 23.057160  19.489054 cg24506579      23.05716
## 142 13.6215030 22.995195  18.351225 cg01549082      22.99519
## 143 22.9359146 20.481779  20.336097 cg11331837      22.93591
## 144 15.8523535 22.408087  12.218595 cg21854924      22.40809
## 145 22.2658369  4.384511  20.090717 cg11438323      22.26584
## 146 19.1366173 17.664706  21.810614 cg17421046      21.81061
## 147 21.0227393 18.873605  11.750042 cg16178271      21.02274
## 148 19.9206416 19.354994  18.389093 cg12466610      19.92064
## 149 18.3038153 19.892505  18.637289 cg05841700      19.89250
## 150  6.8288286 19.478153  17.614024 cg14307563      19.47815
## 151 19.2457721 13.327575  13.384317 cg25436480      19.24577
## 152 16.1924810 19.024224  18.409134 cg00322003      19.02422
## 153 10.2046806 17.707978  17.590101 cg13080267      17.70798
## 154  7.9240213 17.529300   9.739789 cg27639199      17.52930
## 155 13.7437050 16.592669  15.115893 cg16579946      16.59267
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_rf_model_df <- importance_rf_model_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_rf_model_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.

if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_rf_model_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_rf_model_df,n=20)$Feature)
  
  importance_melted_rf_model_df <- importance_rf_model_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_rf_model_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
##          CN Dementia       MCI    Feature MaxImportance
## 1  76.69231 35.05397 100.00000 cg15501526     100.00000
## 2  47.71555 45.84615  76.27852    age.now      76.27852
## 3  25.95242 44.39387  66.75731 cg01153376      66.75731
## 4  30.43682 62.72153  14.64430 cg06864789      62.72153
## 5  21.71188 53.34230  44.54695 cg25259265      53.34230
## 6  53.33662 46.14263  13.15066 cg12279734      53.33662
## 7  31.82029 26.71726  47.88923 cg00962106      47.88923
## 8  47.84868 37.28844  23.54838 cg15775217      47.84868
## 9  13.93399 46.90062  30.07631 cg00247094      46.90062
## 10 29.42197 46.69578  31.56021 cg09584650      46.69578
## 11 44.88270 21.76855  22.14129 cg20685672      44.88270
## 12 24.84683 13.80950  44.40604 cg07028768      44.40604
## 13 43.74533 37.11038  41.38129 cg14564293      43.74533
## 14 29.33208 42.98225  28.24562 cg05096415      42.98225
## 15 20.96885 42.88289  35.73046 cg20507276      42.88289
## 16 23.84463 12.07345  42.66113 cg16652920      42.66113
## 17 29.80680 15.39943  42.36664 cg01128042      42.36664
## 18 29.98856 41.57063  36.93702 cg05234269      41.57063
## 19 28.92334 28.13188  41.51810 cg01667144      41.51810
## 20 20.74599 31.77615  41.44346 cg26069044      41.44346
## [1] "the top 20 features based on max way:"
##  [1] "cg15501526" "age.now"    "cg01153376" "cg06864789" "cg25259265" "cg12279734" "cg00962106"
##  [8] "cg15775217" "cg00247094" "cg09584650" "cg20685672" "cg07028768" "cg14564293" "cg05096415"
## [15] "cg20507276" "cg16652920" "cg01128042" "cg05234269" "cg01667144" "cg26069044"
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.

if(METHOD_FEATURE_FLAG == 5){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")

  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_rf_AUC<-auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")

  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_rf_AUC<-auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 3){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")

  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "CI"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_rf_AUC<-auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.6969
## The AUC value for class CN is: 0.6968504 
## 
## Class: Dementia 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.5867
## The AUC value for class Dementia is: 0.5866883 
## 
## Class: MCI 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.641
## The AUC value for class MCI is: 0.641038

if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Median_rf_AUC<-mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.6415256
print(FeatEval_Median_rf_AUC)
## [1] 0.6415256

9.2.6. SVM

9.2.6.1 SVM Model Training

df_SVM<-processed_data 
featureName_SVM1<-AfterProcess_FeatureName
trainIndex <- createDataPartition(df_SVM$DX, p = 0.7, list = FALSE)
train_data_SVM1 <- df_SVM[trainIndex, ]
test_data_SVM1 <- df_SVM[-trainIndex, ]

X_train_SVM1 <- subset(train_data_SVM1,select = -DX)
y_train_SVM1 <- train_data_SVM1$DX
X_test_SVM1 <- subset(test_data_SVM1, select= -DX )
y_test_SVM1 <- test_data_SVM1$DX
train_control <- trainControl(method = "cv", number = 5, classProbs = TRUE)

svm_model <- caret::train(DX ~ ., data = train_data_SVM1,
                   method = "svmRadial",
                   trControl = train_control)
print(svm_model)
## Support Vector Machines with Radial Basis Function Kernel 
## 
## 455 samples
## 155 predictors
##   3 classes: 'CN', 'Dementia', 'MCI' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 363, 364, 364, 364, 365 
## Resampling results across tuning parameters:
## 
##   C     Accuracy   Kappa    
##   0.25  0.6725089  0.4667618
##   0.50  0.6769045  0.4687381
##   1.00  0.6813012  0.4718549
## 
## Tuning parameter 'sigma' was held constant at a value of 0.003284607
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.003284607 and C = 1.
print(svm_model$bestTune)
##         sigma C
## 3 0.003284607 1
mean_accuracy_svm_model<- mean(svm_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_svm_model)
## [1] 0.6769049
FeatEval_Median_mean_accuracy_cv_svm<-mean_accuracy_svm_model
print(FeatEval_Median_mean_accuracy_cv_svm)
## [1] 0.6769049
train_predictions <- predict(svm_model, newdata = train_data_SVM1)

train_accuracy <- mean(train_predictions == train_data_SVM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  0.953846153846154"
FeatEval_Median_svm_trainAccuracy <- train_accuracy
print(FeatEval_Median_svm_trainAccuracy)
## [1] 0.9538462
predictions <- predict(svm_model, newdata = test_data_SVM1)

cm_FeatEval_Median_svm<-caret::confusionMatrix(predictions,test_data_SVM1$DX)
print(cm_FeatEval_Median_svm)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia MCI
##   CN       44        3  12
##   Dementia  5       17   4
##   MCI      17        8  83
## 
## Overall Statistics
##                                           
##                Accuracy : 0.7461          
##                  95% CI : (0.6786, 0.8059)
##     No Information Rate : 0.513           
##     P-Value [Acc > NIR] : 2.736e-11       
##                                           
##                   Kappa : 0.5689          
##                                           
##  Mcnemar's Test P-Value : 0.441           
## 
## Statistics by Class:
## 
##                      Class: CN Class: Dementia Class: MCI
## Sensitivity             0.6667         0.60714     0.8384
## Specificity             0.8819         0.94545     0.7340
## Pos Pred Value          0.7458         0.65385     0.7685
## Neg Pred Value          0.8358         0.93413     0.8118
## Prevalence              0.3420         0.14508     0.5130
## Detection Rate          0.2280         0.08808     0.4301
## Detection Prevalence    0.3057         0.13472     0.5596
## Balanced Accuracy       0.7743         0.77630     0.7862
cm_FeatEval_Median_svm_Accuracy <- cm_FeatEval_Median_svm$overall["Accuracy"]
cm_FeatEval_Median_svm_Kappa <- cm_FeatEval_Median_svm$overall["Kappa"]
print(cm_FeatEval_Median_svm_Accuracy)
## Accuracy 
## 0.746114
print(cm_FeatEval_Median_svm_Kappa)
##     Kappa 
## 0.5688625

Let’s take a look of the feature importance of the model trained.

library(iml)
predictor_SVM <- Predictor$new(svm_model,data = df_SVM,y=df_SVM$DX)
importance_SVM <- FeatureImp$new(predictor_SVM,loss="ce")
print(importance_SVM)
## Interpretation method:  FeatureImp 
## error function: ce
## 
## Analysed predictor: 
## Prediction task: classification 
## Classes:  
## 
## Analysed data:
## Sampling from data.frame with 648 rows and 156 columns.
## 
## 
## Head of results:
##      feature importance.05 importance importance.95 permutation.error
## 1 cg05234269      1.028571   1.100000      1.125714         0.1188272
## 2 cg24851651      1.057143   1.100000      1.122857         0.1188272
## 3 cg04248279      1.060000   1.085714      1.085714         0.1172840
## 4        PC1      1.034286   1.071429      1.105714         0.1157407
## 5 cg02225060      1.028571   1.071429      1.111429         0.1157407
## 6 cg11133939      1.057143   1.071429      1.097143         0.1157407
plot(importance_SVM)

library(vip)

vip(svm_model, method = "permute", train = train_data_SVM1, target = "DX", nsim = 10, metric = "bal_accuracy", pred_wrapper = predict)

importance_SVM_df<-importance_SVM$results
if(METHOD_FEATURE_FLAG == 5){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")

  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc
  FeatEval_Median_svm_AUC <- auc_value

  print(auc_value) 

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")

  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc
  FeatEval_Median_svm_AUC <- auc_value

  print(auc_value) 

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 3){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")

  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc
  FeatEval_Median_svm_AUC <- auc_value

  print(auc_value) 

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls > cases
## Setting levels: control = 0, case = 1
## Setting direction: controls > cases
## Class: CN 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.5389
## The AUC value for class CN is: 0.5388929 
## 
## Class: Dementia 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) > 28 cases (binary_labels 1).
## Area under the curve: 0.5162
## The AUC value for class Dementia is: 0.5162338 
## 
## Class: MCI 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) > 99 cases (binary_labels 1).
## Area under the curve: 0.5044
## The AUC value for class MCI is: 0.5044058

if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Median_svm_AUC <- mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.5198441
print(FeatEval_Median_svm_AUC )
## [1] 0.5198441

9.3 Selected Based on Frequency

9.3.1 Input Feature For Evaluation

Performance of the selected output features based on Frequency

processed_dataFrame<-df_process_Output_freq
processed_data<-output_Frequency_Feature

AfterProcess_FeatureName<-df_process_frequency_FeatureName
print(head(output_Frequency_Feature))
## # A tibble: 6 × 156
##   DX            PC1        PC2      PC3 cg00962106 cg02225060 cg14710850 cg27452255 cg02981548
##   <fct>       <dbl>      <dbl>    <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
## 1 MCI      -0.214    0.0147    -0.0140       0.912      0.683      0.805      0.900      0.134
## 2 CN       -0.173    0.0575     0.00506      0.538      0.827      0.809      0.659      0.522
## 3 CN       -0.00367  0.0837     0.0291       0.504      0.521      0.829      0.901      0.510
## 4 Dementia -0.187   -0.0112    -0.0323       0.904      0.808      0.834      0.890      0.566
## 5 MCI       0.0268   0.0000165  0.0529       0.896      0.608      0.850      0.578      0.568
## 6 CN       -0.0379   0.0157    -0.00869      0.886      0.764      0.821      0.881      0.508
## # ℹ 147 more variables: cg08861434 <dbl>, cg19503462 <dbl>, cg07152869 <dbl>, cg16749614 <dbl>,
## #   cg05096415 <dbl>, cg23432430 <dbl>, cg17186592 <dbl>, cg00247094 <dbl>, cg09584650 <dbl>,
## #   cg11133939 <dbl>, cg16715186 <dbl>, cg03129555 <dbl>, cg08857872 <dbl>, cg06864789 <dbl>,
## #   cg14924512 <dbl>, cg16652920 <dbl>, cg03084184 <dbl>, cg26219488 <dbl>, cg20913114 <dbl>,
## #   cg06378561 <dbl>, cg26948066 <dbl>, cg25259265 <dbl>, cg06536614 <dbl>, cg24859648 <dbl>,
## #   cg12279734 <dbl>, cg03982462 <dbl>, cg05841700 <dbl>, cg11227702 <dbl>, cg12146221 <dbl>,
## #   cg02621446 <dbl>, cg00616572 <dbl>, cg15535896 <dbl>, cg02372404 <dbl>, cg09854620 <dbl>, …
print(df_process_frequency_FeatureName)
##   [1] "PC1"        "PC2"        "PC3"        "cg00962106" "cg02225060" "cg14710850" "cg27452255"
##   [8] "cg02981548" "cg08861434" "cg19503462" "cg07152869" "cg16749614" "cg05096415" "cg23432430"
##  [15] "cg17186592" "cg00247094" "cg09584650" "cg11133939" "cg16715186" "cg03129555" "cg08857872"
##  [22] "cg06864789" "cg14924512" "cg16652920" "cg03084184" "cg26219488" "cg20913114" "cg06378561"
##  [29] "cg26948066" "cg25259265" "cg06536614" "cg24859648" "cg12279734" "cg03982462" "cg05841700"
##  [36] "cg11227702" "cg12146221" "cg02621446" "cg00616572" "cg15535896" "cg02372404" "cg09854620"
##  [43] "cg04248279" "cg20678988" "cg24861747" "cg10240127" "cg16771215" "cg01667144" "cg13080267"
##  [50] "cg02494911" "cg10750306" "cg11438323" "cg06715136" "cg04412904" "cg12738248" "cg03071582"
##  [57] "cg05570109" "cg15775217" "cg24873924" "cg17738613" "cg01921484" "cg10369879" "cg27341708"
##  [64] "cg12534577" "cg18821122" "cg12682323" "cg05234269" "cg20685672" "cg12228670" "cg11331837"
##  [71] "cg01680303" "cg17421046" "cg03088219" "cg02356645" "cg00322003" "cg01013522" "cg00272795"
##  [78] "cg25758034" "cg26474732" "cg16579946" "cg07523188" "cg11187460" "cg14527649" "cg20370184"
##  [85] "cg17429539" "cg20507276" "cg13885788" "cg16178271" "cg10738648" "cg26069044" "cg25879395"
##  [92] "cg06112204" "cg23161429" "cg25436480" "cg26757229" "cg02932958" "cg18339359" "cg23916408"
##  [99] "cg06950937" "cg12784167" "cg07480176" "cg15865722" "cg27577781" "cg05321907" "cg03660162"
## [106] "cg07138269" "cg20139683" "cg12284872" "cg03327352" "cg23658987" "cg21854924" "cg21697769"
## [113] "cg19512141" "cg08198851" "cg00675157" "cg01153376" "cg01933473" "cg12776173" "cg14564293"
## [120] "cg24851651" "cg22274273" "cg25561557" "cg21209485" "cg10985055" "cg14293999" "cg18819889"
## [127] "cg24506579" "cg19377607" "cg06697310" "cg00696044" "cg01549082" "cg01128042" "cg00999469"
## [134] "cg06118351" "cg12012426" "cg08584917" "cg27272246" "cg15633912" "cg16788319" "cg17906851"
## [141] "cg07028768" "cg27086157" "cg14240646" "cg00154902" "cg14307563" "cg02320265" "cg08779649"
## [148] "cg04664583" "cg12466610" "cg27639199" "cg15501526" "cg00689685" "cg01413796" "cg11247378"
## [155] "age.now"
print(length(df_process_frequency_FeatureName))
## [1] 155
Num_KeyFea_Frequency <- length(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
##                           DX          PC1           PC2          PC3 cg00962106 cg02225060
## 200223270003_R02C01      MCI -0.214185447  1.470293e-02 -0.014043316  0.9124898  0.6828159
## 200223270003_R03C01       CN -0.172761185  5.745834e-02  0.005055871  0.5375751  0.8265195
## 200223270003_R06C01       CN -0.003667305  8.372861e-02  0.029143653  0.5040948  0.5209552
## 200223270003_R07C01 Dementia -0.186779607 -1.117250e-02 -0.032302430  0.9039029  0.8078889
## 200223270006_R01C01      MCI  0.026814649  1.650735e-05  0.052947950  0.8961556  0.6084903
## 200223270006_R04C01       CN -0.037862929  1.571950e-02 -0.008685676  0.8857597  0.7638781
##                     cg14710850 cg27452255 cg02981548 cg08861434 cg19503462 cg07152869
## 200223270003_R02C01  0.8048592  0.9001010  0.1342571  0.8768306  0.7951675  0.8284151
## 200223270003_R03C01  0.8090950  0.6593379  0.5220037  0.4352647  0.4537684  0.5050630
## 200223270003_R06C01  0.8285902  0.9012217  0.5098965  0.8698813  0.6997359  0.8352490
## 200223270003_R07C01  0.8336457  0.8898635  0.5660985  0.4709249  0.7189778  0.5194300
## 200223270006_R01C01  0.8500725  0.5779792  0.5678714  0.8618532  0.7301755  0.5025709
## 200223270006_R04C01  0.8207247  0.8809143  0.5079859  0.9058965  0.4207207  0.8080916
##                     cg16749614 cg05096415 cg23432430 cg17186592 cg00247094 cg09584650
## 200223270003_R02C01  0.8678741  0.9182527  0.9482702  0.9230463  0.5399349 0.08230254
## 200223270003_R03C01  0.8539348  0.5177819  0.9455418  0.8593448  0.9315640 0.09661586
## 200223270003_R06C01  0.5874127  0.6288426  0.9418716  0.8467599  0.5177874 0.52399749
## 200223270003_R07C01  0.5555391  0.6060271  0.9426559  0.4986373  0.5377765 0.11587211
## 200223270006_R01C01  0.8026346  0.5599588  0.9461736  0.8978999  0.9109309 0.42115185
## 200223270006_R04C01  0.7903978  0.5441200  0.9508404  0.9239750  0.5266535 0.56043178
##                     cg11133939 cg16715186 cg03129555 cg08857872 cg06864789 cg14924512
## 200223270003_R02C01  0.1282694  0.2742789  0.6079616  0.3395280 0.05369415  0.5303907
## 200223270003_R03C01  0.5920898  0.7946153  0.5785498  0.8181845 0.46053125  0.9160885
## 200223270003_R06C01  0.5127706  0.8124316  0.9137818  0.2970779 0.87513655  0.9088414
## 200223270003_R07C01  0.8474176  0.7773263  0.9043041  0.2954090 0.49020327  0.9081681
## 200223270006_R01C01  0.8589133  0.8334531  0.9286357  0.8935876 0.47852685  0.9111789
## 200223270006_R04C01  0.5246557  0.8039945  0.9088564  0.8901338 0.05423587  0.5331753
##                     cg16652920 cg03084184 cg26219488 cg20913114 cg06378561 cg26948066
## 200223270003_R02C01  0.9436000  0.8162981  0.9336638 0.36510482  0.9389306  0.4685225
## 200223270003_R03C01  0.9431222  0.7877128  0.9134707 0.80382984  0.9377503  0.5026045
## 200223270003_R06C01  0.9457161  0.4546397  0.9261878 0.03158439  0.5154019  0.9101976
## 200223270003_R07C01  0.9419785  0.7812413  0.9217866 0.81256840  0.9403569  0.9379543
## 200223270006_R01C01  0.9529417  0.7818230  0.4929692 0.81502059  0.4956816  0.9120181
## 200223270006_R04C01  0.9492648  0.7725853  0.9431574 0.90468830  0.9268832  0.8868608
##                     cg25259265 cg06536614 cg24859648 cg12279734 cg03982462 cg05841700
## 200223270003_R02C01  0.4356646  0.5824474 0.83777536  0.6435368  0.8562777  0.2923544
## 200223270003_R03C01  0.8893591  0.5746694 0.44392797  0.1494651  0.6023731  0.9146488
## 200223270003_R06C01  0.4201700  0.5773468 0.03341185  0.8760759  0.8778458  0.3737990
## 200223270003_R07C01  0.4455517  0.5848917 0.43582347  0.8674214  0.8860227  0.5046468
## 200223270006_R01C01  0.8423337  0.5669919 0.03087161  0.6454450  0.8703107  0.8419031
## 200223270006_R04C01  0.8460736  0.5718514 0.02588024  0.8660058  0.8792860  0.9286652
##                     cg11227702 cg12146221 cg02621446 cg00616572 cg15535896 cg02372404
## 200223270003_R02C01 0.86486075  0.2049284  0.8731313  0.9335067  0.3382952 0.03598249
## 200223270003_R03C01 0.49184121  0.1814927  0.8095534  0.9214079  0.9253926 0.02767285
## 200223270003_R06C01 0.02543724  0.8619250  0.7511582  0.9113633  0.3320191 0.03127855
## 200223270003_R07C01 0.45150971  0.1238469  0.8773609  0.9160238  0.9409104 0.55685785
## 200223270006_R01C01 0.89086877  0.2021598  0.2046541  0.4861334  0.9326027 0.02587736
## 200223270006_R04C01 0.87675947  0.1383786  0.7963817  0.9067928  0.9156401 0.02828648
##                     cg09854620 cg04248279 cg20678988 cg24861747 cg10240127 cg16771215
## 200223270003_R02C01  0.5220587  0.8534976  0.8438718  0.3540897  0.9250553 0.88389723
## 200223270003_R03C01  0.8739646  0.8458854  0.8548886  0.4309505  0.9403255 0.07196933
## 200223270003_R06C01  0.8973149  0.8332786  0.7786685  0.8071462  0.9056974 0.09949974
## 200223270003_R07C01  0.8958863  0.3303204  0.8260541  0.3347317  0.9396217 0.64234023
## 200223270006_R01C01  0.9075331  0.5966878  0.3295384  0.3544795  0.9262370 0.62679274
## 200223270006_R04C01  0.9318820  0.8939599  0.8541667  0.5997840  0.9240497 0.06970175
##                     cg01667144 cg13080267 cg02494911 cg10750306 cg11438323 cg06715136
## 200223270003_R02C01  0.8971484 0.78936656  0.3049435 0.04919915  0.4863471  0.3400192
## 200223270003_R03C01  0.3175389 0.78371483  0.2416332 0.55160081  0.8984559  0.9259109
## 200223270003_R06C01  0.9238364 0.09436069  0.2520909 0.54694332  0.8722772  0.9079807
## 200223270003_R07C01  0.8739442 0.09351259  0.2457032 0.59824543  0.5026756  0.6782105
## 200223270006_R01C01  0.2931961 0.45173796  0.8045030 0.53158639  0.8809646  0.8369052
## 200223270006_R04C01  0.8616530 0.49866715  0.7489283 0.05646838  0.8717937  0.8807568
##                     cg04412904 cg12738248 cg03071582 cg05570109 cg15775217 cg24873924
## 200223270003_R02C01 0.05088595 0.85430866  0.9187811  0.3466611  0.5707441  0.3060635
## 200223270003_R03C01 0.07717659 0.88010292  0.5844421  0.5866750  0.9168327  0.8640985
## 200223270003_R06C01 0.08253743 0.51121855  0.6245558  0.4046471  0.6042521  0.8259149
## 200223270003_R07C01 0.06217431 0.09131476  0.9283683  0.6014355  0.9062231  0.8333940
## 200223270006_R01C01 0.11888769 0.91529345  0.5715416  0.5774881  0.9083515  0.8761177
## 200223270006_R04C01 0.08885846 0.91911405  0.6534650  0.8756826  0.6383270  0.8585363
##                     cg17738613 cg01921484 cg10369879 cg27341708 cg12534577 cg18821122
## 200223270003_R02C01  0.6879612 0.90985496  0.9218784 0.48846610  0.8585231  0.9291309
## 200223270003_R03C01  0.6582258 0.90931369  0.3149306 0.02613847  0.8493466  0.5901603
## 200223270003_R06C01  0.1022257 0.92044873  0.9141081 0.86893582  0.8395241  0.5779620
## 200223270003_R07C01  0.8960156 0.91674311  0.9054415 0.02642300  0.8511384  0.9251431
## 200223270006_R01C01  0.8850702 0.02943747  0.2917862 0.47573455  0.8804655  0.9217018
## 200223270006_R04C01  0.8481916 0.89057041  0.9200403 0.89411974  0.3029013  0.5412250
##                     cg12682323 cg05234269 cg20685672 cg12228670 cg11331837 cg01680303
## 200223270003_R02C01  0.9397956 0.93848584 0.67121006  0.8632174 0.03692842  0.5095174
## 200223270003_R03C01  0.9003940 0.57461229 0.79320906  0.8496212 0.57150125  0.1344941
## 200223270003_R06C01  0.9157877 0.02467208 0.66136456  0.8738949 0.03182862  0.7573869
## 200223270003_R07C01  0.9048877 0.56516794 0.80838304  0.8362189 0.03832164  0.4772204
## 200223270006_R01C01  0.1065347 0.94829529 0.08291414  0.8079694 0.93008298  0.1176263
## 200223270006_R04C01  0.8836232 0.56298286 0.84460055  0.6966666 0.54004452  0.5133033
##                     cg17421046  cg03088219 cg02356645 cg00322003 cg01013522 cg00272795
## 200223270003_R02C01  0.9026993 0.844002862  0.5105903  0.1759911  0.6251168 0.46365138
## 200223270003_R03C01  0.9112100 0.007435243  0.5833923  0.5702070  0.8862821 0.82839260
## 200223270003_R06C01  0.8952031 0.120155222  0.5701428  0.3077122  0.5425308 0.07231279
## 200223270003_R07C01  0.9268852 0.826554308  0.5683381  0.6104341  0.8429862 0.78303831
## 200223270006_R01C01  0.1118337 0.066294915  0.5233692  0.6147419  0.0480531 0.78219952
## 200223270006_R04C01  0.4174370 0.574738383  0.9188670  0.2293759  0.8240222 0.44408249
##                     cg25758034 cg26474732 cg16579946 cg07523188 cg11187460 cg14527649
## 200223270003_R02C01  0.6114028  0.7843252  0.6306315  0.7509183 0.03672179  0.2678912
## 200223270003_R03C01  0.6649219  0.8184088  0.6648766  0.1524386 0.92516409  0.7954683
## 200223270003_R06C01  0.2393844  0.7358417  0.6455081  0.7127592 0.03109553  0.8350610
## 200223270003_R07C01  0.7071501  0.7509296  0.8979650  0.8464983 0.53283119  0.8428684
## 200223270006_R01C01  0.2301078  0.8294938  0.6886498  0.7847738 0.54038146  0.8231348
## 200223270006_R04C01  0.6891513  0.8033167  0.6766907  0.8231277 0.91096169  0.8022444
##                     cg20370184 cg17429539 cg20507276 cg13885788 cg16178271 cg10738648
## 200223270003_R02C01 0.37710950  0.7860900 0.12238910  0.9380618  0.6445416 0.44931577
## 200223270003_R03C01 0.05737964  0.7100923 0.38721972  0.9369476  0.6178075 0.49894016
## 200223270003_R06C01 0.04740505  0.7660838 0.47978438  0.5163017  0.6641952 0.05552024
## 200223270003_R07C01 0.83572095  0.6984969 0.02261996  0.9183376  0.7148058 0.03730440
## 200223270006_R01C01 0.04056608  0.6508597 0.37465798  0.5525542  0.6138954 0.54952781
## 200223270006_R04C01 0.04038589  0.2828452 0.03570795  0.9328289  0.9414188 0.59358167
##                     cg26069044 cg25879395 cg06112204 cg23161429 cg25436480 cg26757229
## 200223270003_R02C01 0.92401867 0.88130864  0.5251592  0.8956965 0.84251599  0.6723726
## 200223270003_R03C01 0.94072227 0.02603438  0.8773488  0.9099619 0.49940321  0.1422661
## 200223270003_R06C01 0.93321315 0.91060615  0.8867975  0.8833895 0.34943119  0.7933794
## 200223270003_R07C01 0.56567694 0.89205942  0.5613799  0.9134709 0.85244913  0.8074830
## 200223270006_R01C01 0.94369927 0.47886249  0.9184122  0.8738558 0.44545117  0.5265692
## 200223270006_R04C01 0.02040391 0.02145248  0.9152514  0.9104210 0.02575036  0.7341953
##                     cg02932958 cg18339359 cg23916408 cg06950937 cg12784167 cg07480176
## 200223270003_R02C01  0.7901008  0.8824858  0.1942275  0.8910968 0.81503498  0.5171664
## 200223270003_R03C01  0.4210489  0.9040272  0.9154993  0.2889345 0.02811410  0.3760452
## 200223270003_R06C01  0.3825995  0.8552121  0.8886255  0.9143801 0.03073269  0.6998389
## 200223270003_R07C01  0.7617081  0.3073106  0.8872447  0.8891079 0.84775699  0.2189042
## 200223270006_R01C01  0.8431126  0.8973742  0.2219945  0.8868617 0.83825789  0.5570021
## 200223270006_R04C01  0.7610084  0.2292800  0.1520624  0.9093273 0.45475291  0.4501196
##                     cg15865722 cg27577781 cg05321907 cg03660162 cg07138269 cg20139683
## 200223270003_R02C01 0.89438595  0.8143535  0.2880477  0.8691767  0.5002290  0.8717075
## 200223270003_R03C01 0.90194372  0.8113185  0.1782629  0.5160770  0.9426707  0.9059433
## 200223270003_R06C01 0.92118977  0.8144274  0.8427929  0.9026304  0.5057781  0.8962554
## 200223270003_R07C01 0.09230759  0.7970617  0.8320504  0.5305691  0.9400527  0.9218012
## 200223270006_R01C01 0.93422668  0.8640044  0.2422218  0.9257451  0.9321602  0.1708472
## 200223270006_R04C01 0.92220002  0.8840237  0.2429551  0.8935772  0.9333501  0.1067122
##                     cg12284872 cg03327352 cg23658987 cg21854924 cg21697769 cg19512141
## 200223270003_R02C01  0.8008333  0.8851712 0.79757644  0.8729132  0.8946108  0.8209161
## 200223270003_R03C01  0.7414569  0.8786878 0.07511718  0.7162342  0.2822953  0.7903543
## 200223270003_R06C01  0.7725267  0.3042310 0.10177571  0.7520990  0.8698740  0.8404684
## 200223270003_R07C01  0.7573369  0.8273211 0.46747992  0.8641284  0.9134887  0.2202759
## 200223270006_R01C01  0.7201607  0.8774082 0.76831297  0.6498895  0.2683820  0.8059589
## 200223270006_R04C01  0.8021446  0.8829492 0.08988532  0.5943113  0.2765740  0.7020247
##                     cg08198851 cg00675157 cg01153376 cg01933473 cg12776173 cg14564293
## 200223270003_R02C01  0.6578905  0.9188438  0.4872148  0.2589014 0.10388038 0.52089591
## 200223270003_R03C01  0.6578186  0.9242325  0.9639670  0.6726133 0.87306345 0.04000662
## 200223270003_R06C01  0.1272153  0.9254708  0.2242410  0.2642560 0.70094907 0.04959460
## 200223270003_R07C01  0.8351465  0.5447244  0.5155654  0.1978068 0.11367159 0.03114773
## 200223270006_R01C01  0.8791156  0.5173554  0.9588916  0.7599441 0.09458405 0.51703196
## 200223270006_R04C01  0.1423737  0.9247232  0.9586876  0.7405661 0.86532175 0.51535010
##                     cg24851651 cg22274273 cg25561557 cg21209485 cg10985055 cg14293999
## 200223270003_R02C01 0.03674702  0.4209386 0.76736369  0.8865053  0.8518169  0.2836710
## 200223270003_R03C01 0.05358297  0.4246379 0.03851635  0.8714878  0.8631895  0.9172023
## 200223270003_R06C01 0.05968923  0.4196796 0.47259480  0.2292550  0.5456633  0.9168166
## 200223270003_R07C01 0.60864179  0.4164100 0.43364249  0.2351526  0.8825100  0.9188336
## 200223270006_R01C01 0.08825834  0.7951105 0.46211439  0.8882046  0.8841690  0.1971116
## 200223270006_R04C01 0.91932068  0.0229810 0.44651530  0.2292483  0.8407797  0.9030919
##                     cg18819889 cg24506579 cg19377607 cg06697310 cg00696044 cg01549082
## 200223270003_R02C01  0.9156157  0.5244337 0.05377464  0.8454609 0.55608424  0.2924138
## 200223270003_R03C01  0.9004455  0.5794845 0.90570746  0.8653044 0.07552381  0.7065693
## 200223270003_R06C01  0.9054439  0.9427785 0.06636174  0.2405168 0.79270858  0.2895440
## 200223270003_R07C01  0.9089935  0.9323844 0.68788639  0.8479193 0.03548419  0.6422955
## 200223270006_R01C01  0.9065397  0.9185355 0.06338988  0.8206613 0.10714386  0.8471236
## 200223270006_R04C01  0.9242767  0.4332642 0.91551446  0.7839595 0.18420803  0.6949888
##                     cg01128042 cg00999469 cg06118351 cg12012426 cg08584917 cg27272246
## 200223270003_R02C01  0.9113420  0.3274080 0.36339400  0.9165048  0.5663205  0.8615873
## 200223270003_R03C01  0.5328806  0.2857719 0.47148604  0.9434768  0.9019732  0.8705287
## 200223270003_R06C01  0.5222757  0.2499229 0.86559618  0.9220044  0.9187789  0.8103777
## 200223270003_R07C01  0.5141721  0.2819622 0.83494303  0.9241284  0.6007449  0.0310881
## 200223270006_R01C01  0.9321215  0.2933539 0.02632111  0.9327143  0.9069098  0.7686536
## 200223270006_R04C01  0.5050081  0.2966623 0.83329300  0.9271167  0.9263584  0.4403542
##                     cg15633912 cg16788319 cg17906851 cg07028768 cg27086157 cg14240646
## 200223270003_R02C01  0.1605530  0.9379870  0.9488392  0.4496851  0.9224112  0.5391334
## 200223270003_R03C01  0.9333421  0.8913429  0.9529718  0.8536078  0.9219304  0.2538363
## 200223270003_R06C01  0.8737362  0.8680680  0.6462151  0.8356936  0.3224986  0.1864902
## 200223270003_R07C01  0.9137334  0.8811444  0.9553497  0.4245893  0.3455486  0.6402007
## 200223270006_R01C01  0.9169706  0.3123481  0.6222117  0.8835151  0.8988962  0.7696079
## 200223270006_R04C01  0.8890004  0.2995627  0.6441202  0.4514661  0.9159217  0.1490028
##                     cg00154902 cg14307563 cg02320265 cg08779649 cg04664583 cg12466610
## 200223270003_R02C01  0.5137741  0.1855966  0.8853213 0.44449401  0.5572814 0.05767659
## 200223270003_R03C01  0.8540746  0.8916957  0.4686314 0.45076825  0.5881190 0.59131778
## 200223270003_R06C01  0.8188126  0.8750052  0.4838749 0.04810217  0.9352717 0.06939623
## 200223270003_R07C01  0.4625776  0.8975663  0.8986848 0.42715969  0.9350230 0.04527733
## 200223270006_R01C01  0.4690086  0.8762842  0.8987560 0.89313476  0.9424588 0.05212904
## 200223270006_R04C01  0.4547219  0.9168614  0.4768520 0.59523771  0.9379537 0.05104033
##                     cg27639199 cg15501526 cg00689685 cg01413796 cg11247378  age.now
## 200223270003_R02C01 0.67515415  0.6362531  0.7019389  0.1345128  0.1591185 82.40000
## 200223270003_R03C01 0.67552763  0.6319253  0.8634268  0.2830672  0.7874849 78.60000
## 200223270003_R06C01 0.06233093  0.7435100  0.6378795  0.8194681  0.4807942 80.40000
## 200223270003_R07C01 0.05701332  0.7756577  0.8624541  0.9007710  0.4537348 78.16441
## 200223270006_R01C01 0.05037694  0.3230777  0.6361891  0.2603027  0.1537079 62.90000
## 200223270006_R04C01 0.08144161  0.8342695  0.6356260  0.9207672  0.1686356 80.67796

9.3.2. Logistic Regression Model

9.3.2.1 Logistic Regression Model Training

df_LRM1<-processed_data 
featureName_LRM1<-AfterProcess_FeatureName
library(glmnet)
library(caret)
set.seed(123)
trainIndex <- createDataPartition(df_LRM1$DX, p = 0.7, list = FALSE)
trainData <- df_LRM1[trainIndex, ]
testData <- df_LRM1[-trainIndex, ]
dim(trainData)
## [1] 455 156
dim(testData)
## [1] 193 156
ctrl <- trainControl(method = "cv", number = 5)

model_LRM1 <- caret::train(DX ~ ., data = trainData, method = "glmnet", trControl = ctrl)

predictions <- predict(model_LRM1, newdata = testData,type="raw")
cm_FeatEval_Freq_LRM1<-caret::confusionMatrix(predictions, testData$DX)

print(cm_FeatEval_Freq_LRM1)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia MCI
##   CN       46        7  14
##   Dementia  3       10   4
##   MCI      17       11  81
## 
## Overall Statistics
##                                           
##                Accuracy : 0.7098          
##                  95% CI : (0.6403, 0.7728)
##     No Information Rate : 0.513           
##     P-Value [Acc > NIR] : 2.018e-08       
##                                           
##                   Kappa : 0.4987          
##                                           
##  Mcnemar's Test P-Value : 0.1607          
## 
## Statistics by Class:
## 
##                      Class: CN Class: Dementia Class: MCI
## Sensitivity             0.6970         0.35714     0.8182
## Specificity             0.8346         0.95758     0.7021
## Pos Pred Value          0.6866         0.58824     0.7431
## Neg Pred Value          0.8413         0.89773     0.7857
## Prevalence              0.3420         0.14508     0.5130
## Detection Rate          0.2383         0.05181     0.4197
## Detection Prevalence    0.3472         0.08808     0.5648
## Balanced Accuracy       0.7658         0.65736     0.7602
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
cm_FeatEval_Freq_LRM1_Accuracy <- cm_FeatEval_Freq_LRM1$overall["Accuracy"]
cm_FeatEval_Freq_LRM1_Kappa <- cm_FeatEval_Freq_LRM1$overall["Kappa"]

print(cm_FeatEval_Freq_LRM1_Accuracy)
##  Accuracy 
## 0.7098446
print(cm_FeatEval_Freq_LRM1_Kappa)
##     Kappa 
## 0.4987013
print(model_LRM1)
## glmnet 
## 
## 455 samples
## 155 predictors
##   3 classes: 'CN', 'Dementia', 'MCI' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 364, 365, 363, 364, 364 
## Resampling results across tuning parameters:
## 
##   alpha  lambda        Accuracy   Kappa    
##   0.10   0.0001810831  0.6350263  0.3962356
##   0.10   0.0018108309  0.6460636  0.4102125
##   0.10   0.0181083090  0.6548792  0.4144240
##   0.55   0.0001810831  0.6285290  0.3793868
##   0.55   0.0018108309  0.6505792  0.4121576
##   0.55   0.0181083090  0.6483336  0.3870111
##   1.00   0.0001810831  0.6065010  0.3457739
##   1.00   0.0018108309  0.6394930  0.3907984
##   1.00   0.0181083090  0.5867925  0.2663062
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.01810831.
train_predictions <- predict(model_LRM1, newdata = trainData, type = "raw")

train_accuracy <- mean(train_predictions == trainData$DX)

FeatEval_Freq_LRM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  0.96043956043956"
print(FeatEval_Freq_LRM1_trainAccuracy)
## [1] 0.9604396
mean_accuracy_model_LRM1 <- mean(model_LRM1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM1)
## [1] 0.6329108
FeatEval_Freq_mean_accuracy_cv_LRM1 <- mean_accuracy_model_LRM1
print(FeatEval_Freq_mean_accuracy_cv_LRM1)
## [1] 0.6329108
library(caret)
library(pROC)
if (METHOD_FEATURE_FLAG ==5){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")

  roc_curve <- roc(testData$DX,
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Freq_LRM1_AUC <- auc_value

  print(roc_curve)

  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==4 || METHOD_FEATURE_FLAG==6){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")

  roc_curve <- roc(testData$DX,
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Freq_LRM1_AUC <- auc_value

  print(roc_curve)

  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==3){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")

  roc_curve <- roc(testData$DX,
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Freq_LRM1_AUC <- auc_value

  print(roc_curve)

  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==1){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.8487
## The AUC value for class CN is: 0.8487235 
## 
## Class: Dementia 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.831
## The AUC value for class Dementia is: 0.8309524 
## 
## Class: MCI 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.8189
## The AUC value for class MCI is: 0.818934

if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Freq_LRM1_AUC <- mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.83287
importance_model_LRM1 <- varImp(model_LRM1)

print(importance_model_LRM1)
## glmnet variable importance
## 
##   variables are sorted by maximum importance across the classes
##   only 20 most important variables shown (out of 155)
## 
##                CN  Dementia    MCI
## PC1        90.421 1.000e+02  0.000
## PC2        46.588 7.877e+01  0.000
## PC3         5.926 0.000e+00 68.326
## cg00962106 63.062 1.184e+01 36.931
## cg02225060 23.012 1.264e+01 51.144
## cg14710850 49.617 8.388e+00 25.395
## cg27452255 49.059 1.788e+01 11.818
## cg02981548 26.229 5.642e+00 49.026
## cg08861434 48.679 0.000e+00 42.749
## cg19503462 25.912 4.811e+01  5.779
## cg07152869 27.983 4.673e+01  1.349
## cg16749614 11.546 1.796e+01 45.937
## cg05096415  1.408 4.491e+01 28.926
## cg23432430 44.231 3.504e+00 25.258
## cg17186592  3.091 4.201e+01 26.690
## cg00247094 15.880 4.167e+01 10.430
## cg09584650 41.416 6.519e+00 18.541
## cg11133939 24.211 4.137e-03 40.491
## cg16715186 39.196 7.696e+00 17.048
## cg03129555 12.455 3.861e+01  8.425
plot(importance_model_LRM1, top = 20, main = "Variable Importance Plot")

importance_model_LRM1_df<-importance_model_LRM1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 ||METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6){
  
importance_final_model_LRM1 <- varImp(model_LRM1$finalModel)

library(dplyr)
ordered_importance_final_model_LRM1 <- importance_final_model_LRM1 %>% arrange(desc(Overall))

print(ordered_importance_final_model_LRM1)  
  
}
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_model_LRM1_df$Feature<-rownames(importance_model_LRM1_df)
  importance_model_LRM1_df <- importance_model_LRM1_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_model_LRM1_df)
  
}
##             CN     Dementia         MCI    Feature MaxImportance
## 1   90.4211914 1.000000e+02  0.00000000        PC1   100.0000000
## 2   46.5878880 7.876540e+01  0.00000000        PC2    78.7654000
## 3    5.9264601 0.000000e+00 68.32645216        PC3    68.3264522
## 4   63.0622141 1.184076e+01 36.93111422 cg00962106    63.0622141
## 5   23.0122858 1.263505e+01 51.14401263 cg02225060    51.1440126
## 6   49.6168920 8.388454e+00 25.39546443 cg14710850    49.6168920
## 7   49.0594808 1.787521e+01 11.81773067 cg27452255    49.0594808
## 8   26.2290394 5.642018e+00 49.02601867 cg02981548    49.0260187
## 9   48.6793071 0.000000e+00 42.74904997 cg08861434    48.6793071
## 10  25.9123492 4.811130e+01  5.77902211 cg19503462    48.1113013
## 11  27.9831242 4.672741e+01  1.34850115 cg07152869    46.7274142
## 12  11.5461891 1.795969e+01 45.93728159 cg16749614    45.9372816
## 13   1.4076950 4.490547e+01 28.92648458 cg05096415    44.9054669
## 14  44.2311676 3.504405e+00 25.25809202 cg23432430    44.2311676
## 15   3.0905298 4.200810e+01 26.68998266 cg17186592    42.0081020
## 16  15.8802196 4.167036e+01 10.43007825 cg00247094    41.6703641
## 17  41.4157285 6.518767e+00 18.54140543 cg09584650    41.4157285
## 18  24.2112262 4.137340e-03 40.49126516 cg11133939    40.4912652
## 19  39.1959650 7.696332e+00 17.04820793 cg16715186    39.1959650
## 20  12.4551007 3.861282e+01  8.42549585 cg03129555    38.6128234
## 21   3.1900587 2.010635e+01 38.48747637 cg08857872    38.4874764
## 22  12.1253718 3.682210e+01 11.12411712 cg06864789    36.8220973
## 23   0.0000000 3.530027e+01 26.72665764 cg14924512    35.3002711
## 24   7.2142317 1.187576e+01 34.92323965 cg16652920    34.9232396
## 25  19.1380264 3.459417e+01  0.00000000 cg03084184    34.5941651
## 26   3.6594506 1.335526e+01 34.15727920 cg26219488    34.1572792
## 27  13.4827036 3.380113e+01  6.05577323 cg20913114    33.8011320
## 28   7.1354711 3.347923e+01 11.82731988 cg06378561    33.4792338
## 29  33.3253288 1.548490e+01  2.09925955 cg26948066    33.3253288
## 30   0.5754957 3.329088e+01 17.46770718 cg25259265    33.2908817
## 31  33.2453952 0.000000e+00 21.54807871 cg06536614    33.2453952
## 32   1.6428181 3.231449e+01 17.24494950 cg24859648    32.3144876
## 33  12.7554337 3.077525e+01  2.20670041 cg12279734    30.7752546
## 34  30.6869432 1.116253e+01  2.48770629 cg03982462    30.6869432
## 35   1.2151951 3.061524e+01 16.61247988 cg05841700    30.6152440
## 36  29.8393321 7.666783e+00  7.71316484 cg11227702    29.8393321
## 37  25.3613752 0.000000e+00 29.01436336 cg12146221    29.0143634
## 38   9.6461764 8.950361e+00 28.93559238 cg02621446    28.9355924
## 39   0.0000000 2.259963e+01 28.82432888 cg00616572    28.8243289
## 40  28.4378228 8.978994e+00  6.54134607 cg15535896    28.4378228
## 41  25.4549752 0.000000e+00 28.22163976 cg02372404    28.2216398
## 42   5.0595151 2.778641e+01  8.14331041 cg09854620    27.7864115
## 43  27.6071611 0.000000e+00 15.87051880 cg04248279    27.6071611
## 44   4.0161409 7.689031e+00 27.54146620 cg20678988    27.5414662
## 45   0.0000000 2.751970e+01 13.83635369 cg24861747    27.5196952
## 46  27.4725483 1.566027e+01  0.00000000 cg10240127    27.4725483
## 47   7.7707423 7.230913e+00 27.21848262 cg16771215    27.2184826
## 48   0.6456766 2.697502e+01 14.65302759 cg01667144    26.9750157
## 49  26.9422636 8.953253e+00  2.80393480 cg13080267    26.9422636
## 50   0.0000000 2.616443e+01 26.57745234 cg02494911    26.5774523
## 51   9.3807400 2.645056e+01  5.12022061 cg10750306    26.4505591
## 52  25.4571262 1.207867e+00 11.26327346 cg11438323    25.4571262
## 53   4.8728912 4.039264e+00 25.43196817 cg06715136    25.4319682
## 54  25.1306218 0.000000e+00 15.37088902 cg04412904    25.1306218
## 55   4.7708807 2.485568e+01  5.40428961 cg12738248    24.8556786
## 56  24.4373049 0.000000e+00 18.64901449 cg03071582    24.4373049
## 57   0.0000000 2.430976e+01 15.78618979 cg05570109    24.3097592
## 58  24.2234675 2.027209e+01  0.00000000 cg15775217    24.2234675
## 59   0.0000000 1.993016e+01 24.20338222 cg24873924    24.2033822
## 60   7.5571475 4.145911e+00 24.12000164 cg17738613    24.1200016
## 61  23.8685802 0.000000e+00 20.76931879 cg01921484    23.8685802
## 62   0.0000000 1.628479e+01 23.70800855 cg10369879    23.7080086
## 63   0.0000000 1.838316e+01 23.65118269 cg27341708    23.6511827
## 64   0.0000000 2.355705e+01 21.42857702 cg12534577    23.5570536
## 65   0.0000000 2.343504e+01 17.81796855 cg18821122    23.4350422
## 66   4.6159402 6.920082e+00 23.35097600 cg12682323    23.3509760
## 67  23.3199397 0.000000e+00 14.18264753 cg05234269    23.3199397
## 68  23.0568545 0.000000e+00 22.77665933 cg20685672    23.0568545
## 69  20.3675386 0.000000e+00 22.86397213 cg12228670    22.8639721
## 70  22.7116735 3.671494e+00  8.32801266 cg11331837    22.7116735
## 71   0.0000000 2.268500e+01 20.87817301 cg01680303    22.6849966
## 72  22.4172909 1.166771e+00 10.22563277 cg17421046    22.4172909
## 73  22.2743617 1.928962e+01  0.00000000 cg00322003    22.2743617
## 74  22.2737985 8.049754e+00  2.25464756 cg03088219    22.2737985
## 75  22.2424977 1.528025e+01  0.00000000 cg02356645    22.2424977
## 76   5.8933810 2.207741e+01  1.26303407 cg01013522    22.0774149
## 77  12.6590030 0.000000e+00 21.77176163 cg00272795    21.7717616
## 78  21.6589655 0.000000e+00 14.53301798 cg25758034    21.6589655
## 79   4.7841837 2.163888e+01  1.18656219 cg26474732    21.6388832
## 80   0.0000000 2.126609e+01 17.64235223 cg16579946    21.2660881
## 81   9.5980250 2.121696e+01  0.00000000 cg07523188    21.2169601
## 82  21.2108554 4.532914e+00  5.64881948 cg11187460    21.2108554
## 83   0.0000000 1.703619e+01 20.80948044 cg14527649    20.8094804
## 84   2.7320807 4.853792e+00 20.53830769 cg20370184    20.5383077
## 85  20.5042022 0.000000e+00 13.74634533 cg17429539    20.5042022
## 86   0.0000000 2.029016e+01 10.01093515 cg20507276    20.2901584
## 87   1.1840742 6.815529e+00 20.19225461 cg13885788    20.1922546
## 88   0.0000000 1.556537e+01 20.08333284 cg16178271    20.0833328
## 89   5.5939181 1.529093e+00 19.98843387 cg10738648    19.9884339
## 90   5.1485052 1.992674e+01  2.76062708 cg26069044    19.9267407
## 91   3.2006638 4.954857e+00 19.79636965 cg25879395    19.7963697
## 92  19.6502331 0.000000e+00 12.11739199 cg06112204    19.6502331
## 93   3.2266078 1.921297e+01  1.25563874 cg23161429    19.2129728
## 94  19.0436333 0.000000e+00  8.86160146 cg25436480    19.0436333
## 95  18.8899765 1.898591e+01  0.00000000 cg26757229    18.9859061
## 96  18.8539813 8.150368e+00  0.00000000 cg02932958    18.8539813
## 97   6.3413123 1.862430e+01  0.95143518 cg18339359    18.6242952
## 98  18.5798880 1.513211e+00  1.88583048 cg06950937    18.5798880
## 99  12.0389141 1.857900e+01  0.00000000 cg23916408    18.5790048
## 100  1.5240724 3.185635e+00 18.16459540 cg12784167    18.1645954
## 101 11.9014906 0.000000e+00 18.13462538 cg07480176    18.1346254
## 102  0.0000000 5.493060e+00 17.69570496 cg15865722    17.6957050
## 103 17.6582944 0.000000e+00 13.07004017 cg27577781    17.6582944
## 104 17.1627270 2.947542e+00  2.52613791 cg05321907    17.1627270
## 105 16.8711800 0.000000e+00  7.57596539 cg03660162    16.8711800
## 106 16.7270034 0.000000e+00  9.94809152 cg07138269    16.7270034
## 107 16.7141446 8.370983e-04  5.48685893 cg20139683    16.7141446
## 108  1.5110047 1.660893e+01  3.59685475 cg12284872    16.6089331
## 109 16.5523009 0.000000e+00 15.31421809 cg03327352    16.5523009
## 110  0.0000000 1.652147e+01 12.90253736 cg23658987    16.5214740
## 111  0.0000000 1.473495e+01 16.18728682 cg21854924    16.1872868
## 112 15.7882584 0.000000e+00  6.82534076 cg21697769    15.7882584
## 113 15.6543993 5.763288e+00  0.00000000 cg19512141    15.6543993
## 114 10.3013242 0.000000e+00 15.49206958 cg08198851    15.4920696
## 115  0.4210073 1.508166e+01  0.82546402 cg00675157    15.0816601
## 116  0.0000000 5.704214e+00 15.02064792 cg01153376    15.0206479
## 117  1.8055061 1.496164e+01  0.76647969 cg01933473    14.9616407
## 118 14.8932694 0.000000e+00  4.58710848 cg12776173    14.8932694
## 119  0.0000000 1.065994e+01 14.72714332 cg14564293    14.7271433
## 120 12.4116879 0.000000e+00 14.56951596 cg24851651    14.5695160
## 121  0.0000000 1.452135e+01  2.25494783 cg22274273    14.5213516
## 122 12.7916783 1.451109e+01  0.00000000 cg25561557    14.5110857
## 123 13.7825866 1.440027e+01  0.00000000 cg21209485    14.4002713
## 124  3.9006296 1.430400e+01  0.00000000 cg10985055    14.3040000
## 125  8.0875881 0.000000e+00 14.25269895 cg14293999    14.2526989
## 126  0.0000000 6.075319e+00 13.99960727 cg18819889    13.9996073
## 127  7.9179369 1.389950e+01  0.00000000 cg24506579    13.8995029
## 128 10.4815163 0.000000e+00 13.82052426 cg19377607    13.8205243
## 129  2.6249344 1.359909e+01  0.00000000 cg06697310    13.5990934
## 130 13.5718494 0.000000e+00 10.16664025 cg00696044    13.5718494
## 131  0.0000000 0.000000e+00 13.10339546 cg01549082    13.1033955
## 132  0.0000000 6.890626e+00 13.07660092 cg01128042    13.0766009
## 133  0.2664728 1.247937e+01  1.15838593 cg00999469    12.4793749
## 134  0.0000000 1.077643e+01 12.38849837 cg06118351    12.3884984
## 135  0.0000000 1.124674e+01 11.78627370 cg12012426    11.7862737
## 136 11.7234564 9.459104e+00  0.00000000 cg08584917    11.7234564
## 137  0.0000000 1.167309e+01  2.25087645 cg15633912    11.6730940
## 138 11.6725674 0.000000e+00 11.20194712 cg27272246    11.6725674
## 139 11.3317090 1.972310e+00  0.00000000 cg17906851    11.3317090
## 140  1.1928989 1.133121e+01  0.00000000 cg16788319    11.3312105
## 141  8.9892259 0.000000e+00 11.29248054 cg07028768    11.2924805
## 142  0.0000000 3.124590e+00 10.74047353 cg27086157    10.7404735
## 143  1.7933150 9.609129e+00  0.00000000 cg14240646     9.6091292
## 144  0.0000000 9.463243e+00  9.19135560 cg00154902     9.4632430
## 145  6.6623080 0.000000e+00  9.11133270 cg14307563     9.1113327
## 146  0.0000000 8.531587e+00  0.00000000 cg02320265     8.5315872
## 147  8.2135222 0.000000e+00  7.03393427 cg08779649     8.2135222
## 148  7.6553807 0.000000e+00  7.95295636 cg04664583     7.9529564
## 149  0.0000000 0.000000e+00  6.58682703 cg12466610     6.5868270
## 150  6.2549228 3.707997e+00  0.00000000 cg27639199     6.2549228
## 151  0.0000000 0.000000e+00  5.80982245 cg15501526     5.8098225
## 152  0.0000000 4.840766e+00  3.67774019 cg00689685     4.8407663
## 153  2.7970199 0.000000e+00  0.08162381 cg01413796     2.7970199
## 154  0.0000000 0.000000e+00  2.13884039 cg11247378     2.1388404
## 155  0.5215083 0.000000e+00  0.63572687    age.now     0.6357269
if (!require(reshape2)) {
  install.packages("reshape2")
  library(reshape2)
} else {
  library(reshape2)
}

if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_LRM1_df <- importance_model_LRM1_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.

if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_model_LRM1_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_model_LRM1_df,n=20)$Feature)
  importance_melted_LRM1_df <- importance_model_LRM1_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
##           CN     Dementia       MCI    Feature MaxImportance
## 1  90.421191 100.00000000  0.000000        PC1     100.00000
## 2  46.587888  78.76540001  0.000000        PC2      78.76540
## 3   5.926460   0.00000000 68.326452        PC3      68.32645
## 4  63.062214  11.84075872 36.931114 cg00962106      63.06221
## 5  23.012286  12.63505366 51.144013 cg02225060      51.14401
## 6  49.616892   8.38845390 25.395464 cg14710850      49.61689
## 7  49.059481  17.87521244 11.817731 cg27452255      49.05948
## 8  26.229039   5.64201829 49.026019 cg02981548      49.02602
## 9  48.679307   0.00000000 42.749050 cg08861434      48.67931
## 10 25.912349  48.11130126  5.779022 cg19503462      48.11130
## 11 27.983124  46.72741425  1.348501 cg07152869      46.72741
## 12 11.546189  17.95969178 45.937282 cg16749614      45.93728
## 13  1.407695  44.90546688 28.926485 cg05096415      44.90547
## 14 44.231168   3.50440539 25.258092 cg23432430      44.23117
## 15  3.090530  42.00810202 26.689983 cg17186592      42.00810
## 16 15.880220  41.67036405 10.430078 cg00247094      41.67036
## 17 41.415728   6.51876720 18.541405 cg09584650      41.41573
## 18 24.211226   0.00413734 40.491265 cg11133939      40.49127
## 19 39.195965   7.69633216 17.048208 cg16715186      39.19596
## 20 12.455101  38.61282340  8.425496 cg03129555      38.61282
## [1] "the top 20 features based on max way:"
##  [1] "PC1"        "PC2"        "PC3"        "cg00962106" "cg02225060" "cg14710850" "cg27452255"
##  [8] "cg02981548" "cg08861434" "cg19503462" "cg07152869" "cg16749614" "cg05096415" "cg23432430"
## [15] "cg17186592" "cg00247094" "cg09584650" "cg11133939" "cg16715186" "cg03129555"
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.

9.3.2.2 Model Diagnose & Improve

9.3.2.2.1 Class imbalance
Class imbalance Check
  • Let’s plot the distribution of “DX” using a bar plot.
table(df_LRM1$DX)
## 
##       CN Dementia      MCI 
##      221       94      333
prop.table(table(df_LRM1$DX))
## 
##        CN  Dementia       MCI 
## 0.3410494 0.1450617 0.5138889
table(trainData$DX)
## 
##       CN Dementia      MCI 
##      155       66      234
prop.table(table(trainData$DX))
## 
##        CN  Dementia       MCI 
## 0.3406593 0.1450549 0.5142857
barplot(table(df_LRM1$DX), main = "Whole Data Class Distribution")

For the training Data set:

barplot(table(trainData$DX), main = "Train Data Class Distribution")

  • Let’s calculate the imbalance ratio, which is the ratio of the number of samples in the majority class to the number of samples in the minority class. severe class imbalance will be indicated by high ratio.

    class_counts <- table(df_LRM1$DX)
    imbalance_ratio <- max(class_counts) / min(class_counts)
    print("The imbalance radio of the whole data set is:")
    ## [1] "The imbalance radio of the whole data set is:"
    print(imbalance_ratio)
    ## [1] 3.542553
    class_counts <- table(trainData$DX)
    imbalance_ratio <- max(class_counts) / min(class_counts)
    print("The imbalance radio of the training data set is:")
    ## [1] "The imbalance radio of the training data set is:"
    print(imbalance_ratio)
    ## [1] 3.545455
  • Let’s do Chi-square test which could determine if the class distribution significantly deviates from a balanced distribution. The p-value provided by the test will indicate the significance of class imbalance.

    chisq.test(table(df_LRM1$DX))
    ## 
    ##  Chi-squared test for given probabilities
    ## 
    ## data:  table(df_LRM1$DX)
    ## X-squared = 132.4, df = 2, p-value < 2.2e-16
    chisq.test(table(trainData$DX))
    ## 
    ##  Chi-squared test for given probabilities
    ## 
    ## data:  table(trainData$DX)
    ## X-squared = 93.156, df = 2, p-value < 2.2e-16
Solve Class imbalance use “SMOTE” (NOT OK YET, MAY NEED FURTHER IMPROVE)
library(smotefamily)

smote_data_LGR_1 <- SMOTE(X = trainData[, !names(trainData) %in% "DX"], target = trainData$DX, K = 5, dup_size = 1)

balanced_data_LGR_1 <- smote_data_LGR_1$data
colnames(balanced_data_LGR_1)[colnames(balanced_data_LGR_1) == "class"] <- "DX"
table(balanced_data_LGR_1$DX)
## 
##       CN Dementia      MCI 
##      155      132      234
dim(balanced_data_LGR_1)
## [1] 521 156
Fit Model with Balanced Data
ctrl <- trainControl(method = "cv", number = 5)

model_LRM2 <- caret::train(DX ~ ., data = balanced_data_LGR_1, method = "glmnet", trControl = ctrl)

predictions <- predict(model_LRM2, newdata = testData)
caret::confusionMatrix(predictions, testData$DX)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia MCI
##   CN       45        6  15
##   Dementia  4       11   6
##   MCI      17       11  78
## 
## Overall Statistics
##                                           
##                Accuracy : 0.6943          
##                  95% CI : (0.6241, 0.7584)
##     No Information Rate : 0.513           
##     P-Value [Acc > NIR] : 2.356e-07       
##                                           
##                   Kappa : 0.4779          
##                                           
##  Mcnemar's Test P-Value : 0.5733          
## 
## Statistics by Class:
## 
##                      Class: CN Class: Dementia Class: MCI
## Sensitivity             0.6818         0.39286     0.7879
## Specificity             0.8346         0.93939     0.7021
## Pos Pred Value          0.6818         0.52381     0.7358
## Neg Pred Value          0.8346         0.90116     0.7586
## Prevalence              0.3420         0.14508     0.5130
## Detection Rate          0.2332         0.05699     0.4041
## Detection Prevalence    0.3420         0.10881     0.5492
## Balanced Accuracy       0.7582         0.66613     0.7450
print(model_LRM2)
## glmnet 
## 
## 521 samples
## 155 predictors
##   3 classes: 'CN', 'Dementia', 'MCI' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 416, 417, 417, 417, 417 
## Resampling results across tuning parameters:
## 
##   alpha  lambda       Accuracy   Kappa    
##   0.10   0.000186946  0.7103114  0.5552305
##   0.10   0.001869460  0.7121978  0.5563269
##   0.10   0.018694597  0.7160989  0.5621857
##   0.55   0.000186946  0.6987912  0.5369622
##   0.55   0.001869460  0.7102930  0.5525186
##   0.55   0.018694597  0.6872894  0.5142517
##   1.00   0.000186946  0.6834432  0.5136505
##   1.00   0.001869460  0.7045238  0.5443300
##   1.00   0.018694597  0.6468864  0.4489232
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.0186946.
train_predictions <- predict(model_LRM2, newdata = trainData, type = "raw")


train_accuracy <- mean(train_predictions == trainData$DX)


print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  0.958241758241758"
mean_accuracy_model_LRM2 <- mean(model_LRM2$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM2)
## [1] 0.6966484
importance_model_LRM2 <- varImp(model_LRM2)


print(importance_model_LRM2)
## glmnet variable importance
## 
##   variables are sorted by maximum importance across the classes
##   only 20 most important variables shown (out of 155)
## 
##                CN Dementia    MCI
## PC1        80.688  100.000  0.000
## PC2        38.814   80.731  0.000
## cg00962106 56.202    9.099 33.493
## PC3         7.467    0.000 55.895
## cg19503462 26.324   48.654  6.540
## cg27452255 47.910   21.190  8.082
## cg07152869 27.972   45.992  1.294
## cg05096415  3.341   45.590 28.316
## cg02225060 18.264   12.784 45.588
## cg14710850 45.329    8.655 21.701
## cg02981548 23.095    5.927 45.307
## cg08861434 44.864    0.000 36.603
## cg03129555 14.463   42.033 10.566
## cg23432430 41.989    6.879 20.293
## cg16749614  8.920   17.010 41.732
## cg17186592  3.597   40.136 25.167
## cg14924512  1.857   38.982 23.218
## cg09584650 38.237    7.571 15.083
## cg06864789 13.550   38.081 11.898
## cg03084184 19.832   37.856  3.065
plot(importance_model_LRM2, top = 20, main = "Variable Importance Plot")

importance_model_LRM2_df<-importance_model_LRM2$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5|| METHOD_FEATURE_FLAG==6){

importance_final_model_LRM2 <- varImp(model_LRM2$finalModel)

library(dplyr)

ordered_importance_final_model_LRM2 <- importance_final_model_LRM2 %>% arrange(desc(Overall))

print(ordered_importance_final_model_LRM2)  
  
}
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_model_LRM2_df$Feature<-rownames(importance_model_LRM2_df)
  importance_model_LRM2_df <- importance_model_LRM2_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_model_LRM2_df)
  
}
##               CN     Dementia          MCI    Feature MaxImportance
## 1   80.688499148 100.00000000  0.000000000        PC1   100.0000000
## 2   38.814293894  80.73116981  0.000000000        PC2    80.7311698
## 3   56.201953611   9.09911968 33.493095025 cg00962106    56.2019536
## 4    7.467341807   0.00000000 55.895232595        PC3    55.8952326
## 5   26.324357187  48.65428391  6.540366672 cg19503462    48.6542839
## 6   47.909844955  21.19003754  8.081998635 cg27452255    47.9098450
## 7   27.972226763  45.99197183  1.294371256 cg07152869    45.9919718
## 8    3.340505452  45.58983812 28.315599957 cg05096415    45.5898381
## 9   18.264219950  12.78421147 45.587782522 cg02225060    45.5877825
## 10  45.329062512   8.65503354 21.701060361 cg14710850    45.3290625
## 11  23.094859080   5.92669780 45.307400928 cg02981548    45.3074009
## 12  44.864165165   0.00000000 36.602762852 cg08861434    44.8641652
## 13  14.463094695  42.03327794 10.566238507 cg03129555    42.0332779
## 14  41.989180142   6.87894102 20.292655519 cg23432430    41.9891801
## 15   8.920053612  17.00950243 41.731798290 cg16749614    41.7317983
## 16   3.597324390  40.13634499 25.166657777 cg17186592    40.1363450
## 17   1.857038695  38.98191970 23.217726210 cg14924512    38.9819197
## 18  38.237414837   7.57137035 15.082695906 cg09584650    38.2374148
## 19  13.550341036  38.08096223 11.897814383 cg06864789    38.0809622
## 20  19.832322635  37.85594401  3.065421777 cg03084184    37.8559440
## 21  21.501742479   0.51318561 37.528872407 cg11133939    37.5288724
## 22  13.600172125  37.19457507  9.115549215 cg00247094    37.1945751
## 23   0.542713010  20.67739292 35.721943012 cg08857872    35.7219430
## 24  35.486170490   7.96249429 14.042364819 cg16715186    35.4861705
## 25   4.935661334  35.04584177 17.439094906 cg24859648    35.0458418
## 26  14.089477135  34.55819164  5.437526844 cg12279734    34.5581916
## 27   1.732652241  34.10645155 18.443144849 cg25259265    34.1064516
## 28   8.424378469  34.06823359 11.647480632 cg06378561    34.0682336
## 29   2.315256778  13.35831728 31.978113840 cg26219488    31.9781138
## 30  12.472634453  31.59090814  5.775875747 cg20913114    31.5909081
## 31   5.489171457  11.24392869 31.377264132 cg16652920    31.3772641
## 32   1.405245869  30.97052414 17.379212292 cg05841700    30.9705241
## 33  29.676489142  14.07539866  0.805144625 cg26948066    29.6764891
## 34  28.737869928  12.28540079  0.030031567 cg03982462    28.7378699
## 35  28.258759627   8.09966055  6.642149391 cg11227702    28.2587596
## 36   6.460205795  28.05459552  8.137764089 cg09854620    28.0545955
## 37  27.468559856   0.00000000 21.547011800 cg06536614    27.4685599
## 38   7.547175154   9.69391892 27.097950316 cg02621446    27.0979503
## 39   0.000000000  27.00622824 24.128528762 cg02494911    27.0062282
## 40  20.457069989   0.00000000 26.624603334 cg12146221    26.6246033
## 41   0.000000000  25.79623162 26.592044153 cg00616572    26.5920442
## 42   9.536847794  26.42367491  5.644628072 cg10750306    26.4236749
## 43  26.167765420   7.87816983  6.038449805 cg15535896    26.1677654
## 44   1.140068212  25.93700297 13.651063404 cg01667144    25.9370030
## 45   0.000000000  25.63682735 13.470826797 cg24861747    25.6368274
## 46  25.542272183  15.10162869  0.000000000 cg10240127    25.5422722
## 47  24.118999307   0.00000000 25.125563097 cg02372404    25.1255631
## 48   1.111147195   8.19926972 25.064949566 cg06715136    25.0649496
## 49  24.852255463   0.00000000 16.124340115 cg20685672    24.8522555
## 50   0.000000000  24.78192605 14.617043174 cg05570109    24.7819260
## 51  24.731682044   0.00000000 13.462559109 cg04248279    24.7316820
## 52   4.046691734   5.49746420 24.336610598 cg20678988    24.3366106
## 53   0.000000000  24.20191047 18.411721155 cg12534577    24.2019105
## 54   0.000000000  24.13895507 15.855364668 cg16579946    24.1389551
## 55   4.826389136  24.12762377  5.718101058 cg12738248    24.1276238
## 56   6.529632908   5.92909446 24.064967745 cg16771215    24.0649677
## 57  24.017430734  10.17069127  0.028010156 cg13080267    24.0174307
## 58   5.507030815   5.66722760 23.062350711 cg17738613    23.0623507
## 59  22.325509240   6.54751969  5.652277581 cg11331837    22.3255092
## 60   0.000000000  22.28286919 17.240406275 cg01680303    22.2828692
## 61  22.209623590   0.00000000 13.211566870 cg04412904    22.2096236
## 62   0.000000000  22.09277799 14.934897153 cg18821122    22.0927780
## 63   3.418961026   7.32533965 22.057112379 cg12682323    22.0571124
## 64  22.043809258  16.24182372  0.000000000 cg02356645    22.0438093
## 65   0.000000000  20.82019047 22.036364160 cg24873924    22.0363642
## 66   0.000000000  15.80301510 22.028192936 cg10369879    22.0281929
## 67   6.482408270  21.72869906  0.939018125 cg01013522    21.7286991
## 68  16.476106166   0.00000000 21.606900872 cg12228670    21.6069009
## 69   7.504673650  21.12600379  0.000000000 cg07523188    21.1260038
## 70  21.105189244  18.08808609  0.000000000 cg15775217    21.1051892
## 71  21.025740955   0.00000000 16.866708890 cg03071582    21.0257410
## 72  20.948619901   0.00000000 12.120945430 cg05234269    20.9486199
## 73   0.000000000  20.90529433  7.902213775 cg20507276    20.9052943
## 74   0.000000000  19.09507877 20.828114561 cg27341708    20.8281146
## 75  13.177017420  20.44183263  0.000000000 cg25561557    20.4418326
## 76  20.440424332   8.86938790  0.348103979 cg03088219    20.4404243
## 77  20.431484829   0.00000000 19.510679122 cg01921484    20.4314848
## 78   4.713093446  20.19254710  4.205227539 cg26069044    20.1925471
## 79  20.140991688   0.00000000  7.542878492 cg06112204    20.1409917
## 80  20.087109684   0.00000000 10.293148792 cg25758034    20.0871097
## 81  20.072192022   0.22939651  9.403893500 cg17421046    20.0721920
## 82  19.725429611   0.00000000 12.789799128 cg11438323    19.7254296
## 83  19.701064237   0.00000000  9.922748112 cg17429539    19.7010642
## 84  19.537050438  14.85961436  0.000000000 cg00322003    19.5370504
## 85  19.326127572   4.15858089  4.744396470 cg11187460    19.3261276
## 86   2.515250593   5.41832059 18.975214454 cg25879395    18.9752145
## 87   4.058436074  18.84858496  0.228016041 cg26474732    18.8485850
## 88   2.892526937  18.77824046  2.420405667 cg23161429    18.7782405
## 89   1.683266081   4.78596818 18.693202335 cg20370184    18.6932023
## 90  18.637545772   0.02146389  6.333570297 cg25436480    18.6375458
## 91   0.009452251   7.64519394 18.618965539 cg13885788    18.6189655
## 92  11.433741314  18.26499711  0.000000000 cg23916408    18.2649971
## 93   0.000000000  16.67198659 18.165974740 cg14527649    18.1659747
## 94   5.005436113   1.01254421 18.050518899 cg10738648    18.0505189
## 95   0.000000000  17.96896566 12.783854784 cg23658987    17.9689657
## 96   5.986548344  17.93929168  1.282747947 cg18339359    17.9392917
## 97  10.255499373   0.00000000 17.836580203 cg07480176    17.8365802
## 98  16.796579637  17.79312862  0.000000000 cg26757229    17.7931286
## 99   2.972810732  17.77785615  4.058961200 cg12284872    17.7778562
## 100  8.056052971  17.46817747  0.000000000 cg24506579    17.4681775
## 101 17.452539976   8.51228551  0.000000000 cg02932958    17.4525400
## 102 13.352445749   0.00000000 17.317905474 cg00272795    17.3179055
## 103  0.000000000   7.44221308 17.200532252 cg12784167    17.2005323
## 104 16.764866307   0.00000000  6.646108759 cg03660162    16.7648663
## 105  0.000000000  16.00794740 16.457904409 cg16178271    16.4579044
## 106 16.352599342   0.00000000 11.995443315 cg27577781    16.3525993
## 107 16.126430414   0.00000000  8.289563199 cg07138269    16.1264304
## 108 15.974517734   2.87942381  2.065851003 cg05321907    15.9745177
## 109  0.755584018  15.69076519  2.146573132 cg22274273    15.6907652
## 110  0.467255566   3.15546065 15.548529013 cg15865722    15.5485290
## 111 13.414416608  15.52962619  0.000000000 cg21209485    15.5296262
## 112 15.467767202   0.63649811  3.699141098 cg20139683    15.4677672
## 113  0.807255447  15.27246532  2.248226332 cg15633912    15.2724653
## 114  1.773934188  15.19614251  0.493410601 cg00675157    15.1961425
## 115  0.000000000  15.00431453 13.731857733 cg21854924    15.0043145
## 116  0.000000000   8.28396704 14.990763999 cg14564293    14.9907640
## 117  1.419752037  14.68128295  1.625955063 cg01933473    14.6812830
## 118 14.363558658   0.00000000  2.334430859 cg06950937    14.3635587
## 119  7.039607983   0.00000000 14.270472500 cg14293999    14.2704725
## 120  0.000000000   7.61428358 14.108315063 cg01128042    14.1083151
## 121 13.963221888   0.00000000 13.899173399 cg03327352    13.9632219
## 122 13.956539489   0.00000000  2.029774778 cg12776173    13.9565395
## 123  8.337809729   0.00000000 13.927313821 cg24851651    13.9273138
## 124  8.522498604   0.00000000 13.705406963 cg19377607    13.7054070
## 125 13.702247521   0.00000000  7.328782945 cg00696044    13.7022475
## 126  0.000000000   2.81157158 13.614359374 cg01153376    13.6143594
## 127 13.568528499   3.88094848  0.000000000 cg19512141    13.5685285
## 128  0.000000000   6.29526893 13.552242319 cg18819889    13.5522423
## 129  8.863676187   0.00000000 13.138149457 cg27272246    13.1381495
## 130 12.206523053   0.00000000 13.006438244 cg08198851    13.0064382
## 131  0.000000000   9.80650980 12.678148985 cg06118351    12.6781490
## 132  4.065417376  12.40020977  0.000000000 cg10985055    12.4002098
## 133  0.922128548  11.76022383  0.005104054 cg16788319    11.7602238
## 134  1.039323446  11.74071914  0.000000000 cg14240646    11.7407191
## 135  0.791497817  11.56711305  0.394794257 cg00999469    11.5671130
## 136  0.000000000  11.34388307 10.937086239 cg12012426    11.3438831
## 137  0.000000000   2.69796677 10.877759543 cg01549082    10.8777595
## 138 10.745029758   0.00000000  9.148608471 cg21697769    10.7450298
## 139 10.662680862   0.00000000  7.596063644 cg07028768    10.6626809
## 140 10.319768459   3.95272245  0.000000000 cg17906851    10.3197685
## 141  0.000000000   8.38289397  9.799519490 cg27086157     9.7995195
## 142  0.310175184   9.75654690  0.000000000 cg06697310     9.7565469
## 143  9.736065016   9.24207711  0.000000000 cg08584917     9.7360650
## 144  0.606268787   9.52991966  0.000000000 cg02320265     9.5299197
## 145  2.501266350   0.00000000  9.497362665 cg04664583     9.4973627
## 146  4.880778567   0.00000000  8.724030662 cg14307563     8.7240307
## 147  6.240259640   0.00000000  8.437061685 cg08779649     8.4370617
## 148  0.000000000   6.06486468  7.341051542 cg00154902     7.3410515
## 149  0.000000000   0.00000000  6.380539190 cg12466610     6.3805392
## 150  6.357489751   4.09173156  0.000000000 cg27639199     6.3574898
## 151  0.000000000   5.86104601  4.818679307 cg00689685     5.8610460
## 152  0.000000000   2.98997653  5.167419744 cg15501526     5.1674197
## 153  2.833682181   0.00000000  0.000000000 cg01413796     2.8336822
## 154  0.421065431   0.00000000  0.566138305    age.now     0.5661383
## 155  0.000000000   0.43502977  0.045856942 cg11247378     0.4350298
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_LRM2_df <- importance_model_LRM2_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM2_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.

if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_model_LRM2_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_model_LRM2_df,n=20)$Feature)
  
  importance_melted_LRM2_df <- importance_model_LRM2_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM2_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
##           CN   Dementia       MCI    Feature MaxImportance
## 1  80.688499 100.000000  0.000000        PC1     100.00000
## 2  38.814294  80.731170  0.000000        PC2      80.73117
## 3  56.201954   9.099120 33.493095 cg00962106      56.20195
## 4   7.467342   0.000000 55.895233        PC3      55.89523
## 5  26.324357  48.654284  6.540367 cg19503462      48.65428
## 6  47.909845  21.190038  8.081999 cg27452255      47.90984
## 7  27.972227  45.991972  1.294371 cg07152869      45.99197
## 8   3.340505  45.589838 28.315600 cg05096415      45.58984
## 9  18.264220  12.784211 45.587783 cg02225060      45.58778
## 10 45.329063   8.655034 21.701060 cg14710850      45.32906
## 11 23.094859   5.926698 45.307401 cg02981548      45.30740
## 12 44.864165   0.000000 36.602763 cg08861434      44.86417
## 13 14.463095  42.033278 10.566239 cg03129555      42.03328
## 14 41.989180   6.878941 20.292656 cg23432430      41.98918
## 15  8.920054  17.009502 41.731798 cg16749614      41.73180
## 16  3.597324  40.136345 25.166658 cg17186592      40.13634
## 17  1.857039  38.981920 23.217726 cg14924512      38.98192
## 18 38.237415   7.571370 15.082696 cg09584650      38.23741
## 19 13.550341  38.080962 11.897814 cg06864789      38.08096
## 20 19.832323  37.855944  3.065422 cg03084184      37.85594
## [1] "the top 20 features based on max way:"
##  [1] "PC1"        "PC2"        "cg00962106" "PC3"        "cg19503462" "cg27452255" "cg07152869"
##  [8] "cg05096415" "cg02225060" "cg14710850" "cg02981548" "cg08861434" "cg03129555" "cg23432430"
## [15] "cg16749614" "cg17186592" "cg14924512" "cg09584650" "cg06864789" "cg03084184"
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.

if(METHOD_FEATURE_FLAG == 5){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX, prob_predictions[, "MCI"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX, prob_predictions[, "Dementia"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX, prob_predictions[, "CI"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.8505
## The AUC value for class CN is: 0.850513 
## 
## Class: Dementia 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.8357
## The AUC value for class Dementia is: 0.8357143 
## 
## Class: MCI 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.8188
## The AUC value for class MCI is: 0.8188266

if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
}
## The mean AUC value across all classes with one versus rest method is: 0.835018
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
}
## The mean AUC value across all classes with one versus rest method is: 0.835018

9.3.3. Elastic Net

9.3.3.1 Elastic Net Model Training

df_ENM1<-processed_data 
featureName_ENM1<-AfterProcess_FeatureName
library(caret)
set.seed(123)
trainIndex <- createDataPartition(df_ENM1$DX, p = 0.7, list = FALSE)
trainData_ENM1 <- df_ENM1[trainIndex, ]
testData_ENM1 <- df_ENM1[-trainIndex, ]
ctrl <- trainControl(method = "cv", number = 5)

param_grid <- expand.grid(alpha = 0:1, lambda = seq(0.001, 1, length = 20))

elastic_net_model1 <- caret::train(DX ~ ., data = trainData_ENM1, method = "glmnet",
                           trControl = ctrl, tuneGrid = param_grid)

print(elastic_net_model1)
## glmnet 
## 
## 455 samples
## 155 predictors
##   3 classes: 'CN', 'Dementia', 'MCI' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 364, 365, 363, 364, 364 
## Resampling results across tuning parameters:
## 
##   alpha  lambda      Accuracy   Kappa     
##   0      0.00100000  0.6571736  0.42345797
##   0      0.05357895  0.6725349  0.43439423
##   0      0.10615789  0.6747338  0.43094148
##   0      0.15873684  0.6725599  0.42391171
##   0      0.21131579  0.6725837  0.41818370
##   0      0.26389474  0.6770526  0.42406079
##   0      0.31647368  0.6769804  0.41856449
##   0      0.36905263  0.6726087  0.40853473
##   0      0.42163158  0.6638170  0.38542265
##   0      0.47421053  0.6660148  0.38902178
##   0      0.52678947  0.6594214  0.37628816
##   0      0.57936842  0.6550252  0.36510400
##   0      0.63194737  0.6528274  0.35927177
##   0      0.68452632  0.6418618  0.33471759
##   0      0.73710526  0.6352200  0.31832804
##   0      0.78968421  0.6307756  0.30720022
##   0      0.84226316  0.6263800  0.29777058
##   0      0.89484211  0.6220322  0.28739881
##   0      0.94742105  0.6220322  0.28739881
##   0      1.00000000  0.6220322  0.28682520
##   1      0.00100000  0.6240596  0.37352512
##   1      0.05357895  0.5187546  0.05457313
##   1      0.10615789  0.5142862  0.00000000
##   1      0.15873684  0.5142862  0.00000000
##   1      0.21131579  0.5142862  0.00000000
##   1      0.26389474  0.5142862  0.00000000
##   1      0.31647368  0.5142862  0.00000000
##   1      0.36905263  0.5142862  0.00000000
##   1      0.42163158  0.5142862  0.00000000
##   1      0.47421053  0.5142862  0.00000000
##   1      0.52678947  0.5142862  0.00000000
##   1      0.57936842  0.5142862  0.00000000
##   1      0.63194737  0.5142862  0.00000000
##   1      0.68452632  0.5142862  0.00000000
##   1      0.73710526  0.5142862  0.00000000
##   1      0.78968421  0.5142862  0.00000000
##   1      0.84226316  0.5142862  0.00000000
##   1      0.89484211  0.5142862  0.00000000
##   1      0.94742105  0.5142862  0.00000000
##   1      1.00000000  0.5142862  0.00000000
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0 and lambda = 0.2638947.
mean_accuracy_elastic_net_model1 <- mean(elastic_net_model1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_elastic_net_model1)
## [1] 0.5868408
FeatEval_Freq_mean_accuracy_cv_ENM1<-mean_accuracy_elastic_net_model1
print(FeatEval_Freq_mean_accuracy_cv_ENM1)
## [1] 0.5868408
train_predictions <- predict(elastic_net_model1, newdata = trainData, type = "raw")

train_accuracy <- mean(train_predictions == trainData_ENM1$DX)

FeatEval_Freq_ENM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  0.863736263736264"
print(FeatEval_Freq_ENM1_trainAccuracy)
## [1] 0.8637363
predictions <- predict(elastic_net_model1, newdata = testData_ENM1)
cm_FeatEval_Freq_ENM1 <- caret::confusionMatrix(predictions,testData_ENM1$DX)
print(cm_FeatEval_Freq_ENM1)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia MCI
##   CN       45        5  13
##   Dementia  0        8   0
##   MCI      21       15  86
## 
## Overall Statistics
##                                           
##                Accuracy : 0.7202          
##                  95% CI : (0.6512, 0.7823)
##     No Information Rate : 0.513           
##     P-Value [Acc > NIR] : 3.473e-09       
##                                           
##                   Kappa : 0.4987          
##                                           
##  Mcnemar's Test P-Value : 6.901e-05       
## 
## Statistics by Class:
## 
##                      Class: CN Class: Dementia Class: MCI
## Sensitivity             0.6818         0.28571     0.8687
## Specificity             0.8583         1.00000     0.6170
## Pos Pred Value          0.7143         1.00000     0.7049
## Neg Pred Value          0.8385         0.89189     0.8169
## Prevalence              0.3420         0.14508     0.5130
## Detection Rate          0.2332         0.04145     0.4456
## Detection Prevalence    0.3264         0.04145     0.6321
## Balanced Accuracy       0.7700         0.64286     0.7429
cm_FeatEval_Freq_ENM1_Accuracy<-cm_FeatEval_Freq_ENM1$overall["Accuracy"]
cm_FeatEval_Freq_ENM1_Kappa<-cm_FeatEval_Freq_ENM1$overall["Kappa"]
print(cm_FeatEval_Freq_ENM1_Accuracy)
##  Accuracy 
## 0.7202073
print(cm_FeatEval_Freq_ENM1_Kappa)
##     Kappa 
## 0.4986772
importance_elastic_net_model1<- varImp(elastic_net_model1)
print(importance_elastic_net_model1)
## glmnet variable importance
## 
##   variables are sorted by maximum importance across the classes
##   only 20 most important variables shown (out of 155)
## 
##               CN Dementia    MCI
## PC1        86.62  100.000 13.316
## PC2        68.41   88.605 20.130
## cg00962106 72.97   12.359 60.543
## cg02225060 43.13   18.828 62.025
## cg02981548 49.97    8.974 59.004
## cg23432430 57.29   15.760 41.467
## cg14710850 54.50    8.363 46.076
## cg16749614 20.68   33.684 54.424
## cg07152869 48.29   54.289  5.937
## cg08857872 29.00   24.415 53.478
## cg16652920 27.04   25.381 52.480
## cg26948066 51.16   42.093  9.006
## PC3        12.11   38.679 50.851
## cg08861434 48.60    1.032 49.700
## cg27452255 49.50   29.762 19.675
## cg09584650 48.11   20.546 27.504
## cg11133939 31.92   15.802 47.781
## cg19503462 47.24   44.926  2.252
## cg06864789 20.57   46.479 25.849
## cg02372404 30.74   14.684 45.487
plot(importance_elastic_net_model1, top = 20, main = "Variable Importance Plot")

importance_elastic_net_model1_df<-importance_elastic_net_model1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6 ){
importance_elastic_net_final_model1 <- varImp(elastic_net_model1$finalModel)

library(dplyr)
Ordered_importance_elastic_net_final_model1 <- importance_elastic_net_final_model1 %>% arrange(desc(Overall))

print(Ordered_importance_elastic_net_final_model1) 
  
}
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_elastic_net_model1_df$Feature<-rownames(importance_elastic_net_model1_df)
  importance_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_elastic_net_model1_df)
  
}
##              CN     Dementia        MCI    Feature MaxImportance
## 1   86.62050902 1.000000e+02 13.3160630        PC1   100.0000000
## 2   68.41216879 8.860540e+01 20.1298078        PC2    88.6054047
## 3   72.96575230 1.235886e+01 60.5434683 cg00962106    72.9657523
## 4   43.13322979 1.882820e+01 62.0248574 cg02225060    62.0248574
## 5   49.96585084 8.974481e+00 59.0037601 cg02981548    59.0037601
## 6   57.29080134 1.576006e+01 41.4673182 cg23432430    57.2908013
## 7   54.50228710 8.362895e+00 46.0759636 cg14710850    54.5022871
## 8   20.67688605 3.368387e+01 54.4241861 cg16749614    54.4241861
## 9   48.28878775 5.428948e+01  5.9372679 cg07152869    54.2894836
## 10  28.99951869 2.441523e+01 53.4781726 cg08857872    53.4781726
## 11  27.03639026 2.538054e+01 52.4803598 cg16652920    52.4803598
## 12  51.16229060 4.209275e+01  9.0061160 cg26948066    51.1622906
## 13  12.10910616 3.867890e+01 50.8514321        PC3    50.8514321
## 14  48.60433053 1.032191e+00 49.6999500 cg08861434    49.6999500
## 15  49.49968697 2.976165e+01 19.6746118 cg27452255    49.4996870
## 16  48.11340007 2.054619e+01 27.5037772 cg09584650    48.1134001
## 17  31.91638249 1.580156e+01 47.7813735 cg11133939    47.7813735
## 18  47.24092693 4.492567e+01  2.2518257 cg19503462    47.2409269
## 19  20.56659243 4.647858e+01 25.8485628 cg06864789    46.4785832
## 20  30.73957153 1.468363e+01 45.4866296 cg02372404    45.4866296
## 21  13.69533098 4.531490e+01 31.5561459 cg24859648    45.3149049
## 22  10.38181326 3.472108e+01 45.1663236 cg14527649    45.1663236
## 23  44.71184790 3.266512e+01 11.9832982 cg03982462    44.7118479
## 24  43.77940241 1.498799e+01 28.7279844 cg06536614    43.7794024
## 25   0.05859366 4.329987e+01 43.1778488 cg17186592    43.2998704
## 26  26.35094279 1.675569e+01 43.1700654 cg26219488    43.1700654
## 27  42.96455132 1.408192e+01 28.8191985 cg10240127    42.9645513
## 28  13.43785370 4.289988e+01 29.3985988 cg00247094    42.8998806
## 29  35.47328840 6.861122e+00 42.3978384 cg20685672    42.3978384
## 30   3.59329081 4.215453e+01 38.4978073 cg25259265    42.1545261
## 31  42.14098876 1.425796e+01 27.8195995 cg16715186    42.1409888
## 32   0.72641363 4.194272e+01 41.1528826 cg05096415    41.9427243
## 33  34.83565307 4.176144e+01  6.8623616 cg15775217    41.7614426
## 34  15.96506670 4.058693e+01 24.5584393 cg24861747    40.5869340
## 35  34.02724408 6.215796e+00 40.3064684 cg07028768    40.3064684
## 36   4.43188361 3.973411e+01 35.2388010 cg14924512    39.7341126
## 37  24.98036059 3.964332e+01 14.5995338 cg03084184    39.6433224
## 38   4.46860104 3.906794e+01 34.5359093 cg05570109    39.0679383
## 39  34.88126397 3.997365e+00 38.9420569 cg01921484    38.9420569
## 40   9.76423401 2.779011e+01 37.6177708 cg00154902    37.6177708
## 41  28.32432504 3.744025e+01  9.0524945 cg26757229    37.4402476
## 42  37.35444255 9.845102e+00 27.4459127 cg03660162    37.3544426
## 43  35.88197667 5.225476e-01 36.4679523 cg12228670    36.4679523
## 44   4.42287463 3.174001e+01 36.2263084 cg00616572    36.2263084
## 45  14.12090749 3.616528e+01 21.9809420 cg20507276    36.1652775
## 46   5.45819324 3.544696e+01 29.9253415 cg05841700    35.4469628
## 47  21.86551281 1.351449e+01 35.4434316 cg06715136    35.4434316
## 48  22.83649675 1.227241e+01 35.1723309 cg02621446    35.1723309
## 49  18.36622546 3.502290e+01 16.5932465 cg12738248    35.0228999
## 50  14.22686588 3.493731e+01 20.6470141 cg09854620    34.9373080
## 51  32.22108254 3.481801e+01  2.5335040 cg00322003    34.8180145
## 52   8.08392092 2.660860e+01 34.7559518 cg24873924    34.7559518
## 53  14.17904774 3.469767e+01 20.4551950 cg03129555    34.6976707
## 54  34.67519870 7.587119e+00 27.0246513 cg04412904    34.6751987
## 55  15.01097938 1.956984e+01 34.6442427 cg17738613    34.6442427
## 56  18.92309050 1.558852e+01 34.5750392 cg25879395    34.5750392
## 57  34.34052148 1.088586e+01 23.3912285 cg05234269    34.3405215
## 58  22.74814328 3.407060e+01 11.2590310 cg20913114    34.0706023
## 59   1.10432730 3.256996e+01 33.7377127 cg02494911    33.7377127
## 60  17.46539414 3.350897e+01 15.9801525 cg00675157    33.5089746
## 61  26.90531215 3.346397e+01  6.4952294 cg12279734    33.4639696
## 62  12.81006898 2.054691e+01 33.4204064 cg01153376    33.4204064
## 63  30.29072569 2.969637e+00 33.3237905 cg04248279    33.3237905
## 64  30.63924812 3.320584e+01  2.5031614 cg06697310    33.2058375
## 65  25.57263487 3.289020e+01  7.2541345 cg26474732    32.8901974
## 66  19.20126650 1.362518e+01 32.8898776 cg16771215    32.8898776
## 67   1.21419254 3.269657e+01 31.4189519 cg12534577    32.6965725
## 68  14.55277128 3.243786e+01 17.8216650 cg06378561    32.4378643
## 69  19.19190875 1.316032e+01 32.4156554 cg18819889    32.4156554
## 70  29.77425177 3.221985e+01  2.3821664 cg01013522    32.2198462
## 71   8.93886565 2.321185e+01 32.2141388 cg10369879    32.2141388
## 72  31.33704403 9.314914e+00 21.9587019 cg03327352    31.3370440
## 73  31.30160654 8.696956e+00 22.5412221 cg07138269    31.3016065
## 74  30.27943029 7.143411e-01 31.0571995 cg12146221    31.0571995
## 75  31.01419234 1.154350e+01 19.4072654 cg11227702    31.0141923
## 76  30.51179268 2.044969e-01 30.7797176 cg27577781    30.7797176
## 77  30.73490550 2.929819e+01  1.3732876 cg02356645    30.7349055
## 78  10.88639219 1.960561e+01 30.5554321 cg15865722    30.5554321
## 79  21.12443442 3.052442e+01  9.3365531 cg18339359    30.5244155
## 80  21.72330090 3.049950e+01  8.7127705 cg08584917    30.4994994
## 81  30.48083627 1.623341e+01 14.1840006 cg15535896    30.4808363
## 82   9.34548938 3.034926e+01 20.9403427 cg01680303    30.3492601
## 83   0.66026098 2.956826e+01 30.2919448 cg01667144    30.2919448
## 84  17.55646953 2.993258e+01 12.3126775 cg07523188    29.9325751
## 85  12.72027022 1.708320e+01 29.8669009 cg21854924    29.8669009
## 86   9.99154586 2.974237e+01 19.6873945 cg10750306    29.7423684
## 87   5.72162167 2.961493e+01 23.8298800 cg16579946    29.6149297
## 88  29.45266239 5.868392e+00 23.5208426 cg11438323    29.4526624
## 89   7.90101220 2.936462e+01 21.4001830 cg18821122    29.3646232
## 90  13.46830088 1.551925e+01 29.0509798 cg01128042    29.0509798
## 91  12.43894239 1.650670e+01 29.0090673 cg14564293    29.0090673
## 92  28.69826490 4.398695e-01 28.1949674 cg08198851    28.6982649
## 93  25.92092461 2.699178e+00 28.6835305 cg00696044    28.6835305
## 94  28.64621193 7.486104e+00 21.0966801 cg17421046    28.6462119
## 95  28.22427509 1.423333e+01 13.9275209 cg11331837    28.2242751
## 96   4.57983333 2.318275e+01 27.8260122 cg12682323    27.8260122
## 97  27.75391376 2.314589e+01  4.5445933 cg02932958    27.7539138
## 98   2.22968717 2.770318e+01 25.4100660 cg23658987    27.7031812
## 99  13.54188950 1.406081e+01 27.6661275 cg07480176    27.6661275
## 100 18.99323166 8.562681e+00 27.6193403 cg10738648    27.6193403
## 101 23.23802196 4.225899e+00 27.5273491 cg03071582    27.5273491
## 102 27.50544612 1.371725e+01 13.7247673 cg25758034    27.5054461
## 103  8.31672910 1.850480e+01 26.8849556 cg06118351    26.8849556
## 104 26.47262314 2.668285e+01  0.1468021 cg19512141    26.6828533
## 105 15.77531966 2.662511e+01 10.7863633 cg23161429    26.6251110
## 106 13.98131160 2.639430e+01 12.3495607 cg11247378    26.3943003
## 107 18.59047099 7.683583e+00 26.3374815 cg20678988    26.3374815
## 108 14.37104595 1.154409e+01 25.9785682 cg27086157    25.9785682
## 109 25.84351166 9.775209e+00 16.0048742 cg03088219    25.8435117
## 110 13.62701522 2.527486e+01 11.5844190 cg22274273    25.2748622
## 111  2.73162960 2.236009e+01 25.1551464 cg13885788    25.1551464
## 112  7.96956513 1.668287e+01 24.7158621 cg14240646    24.7158621
## 113 23.64352178 7.872467e-01 24.4941965 cg06112204    24.4941965
## 114 24.37778097 4.912532e+00 19.4018209 cg17429539    24.3777810
## 115 23.05300756 2.435067e+01  1.2342388 cg25561557    24.3506743
## 116 21.11637573 3.134858e+00 24.3146618 cg14293999    24.3146618
## 117 15.52345785 8.640339e+00 24.2272249 cg19377607    24.2272249
## 118 21.13573933 2.410962e+01  2.9104517 cg06950937    24.1096190
## 119 24.09497385 4.091576e+00 19.9399703 cg25436480    24.0949738
## 120 14.61521652 9.016215e+00 23.6948594 cg00272795    23.6948594
## 121 10.00915361 1.338532e+01 23.4578980 cg12012426    23.4578980
## 122 23.37900124 1.718161e+01  6.1339628 cg05321907    23.3790012
## 123 23.15395979 9.972972e+00 13.1175593 cg20139683    23.1539598
## 124  0.72092593 2.312580e+01 22.3414416 cg26069044    23.1257956
## 125 21.02472207 2.241615e+01  1.3279969 cg23916408    22.4161470
## 126  0.60251447 2.222861e+01 21.5626641 cg27341708    22.2286066
## 127 15.97348851 2.221336e+01  6.1764438 cg13080267    22.2133604
## 128 21.86060829 1.296764e+00 20.5004165 cg27272246    21.8606083
## 129  0.95815549 2.184265e+01 20.8210630 cg12284872    21.8426466
## 130  2.40865385 2.169918e+01 19.2270978 cg00689685    21.6991797
## 131  2.01195112 2.152617e+01 19.4507897 cg16178271    21.5261688
## 132 21.27759398 8.124505e+00 13.0896611 cg21209485    21.2775940
## 133 20.58951338 1.059009e+01  9.9359937 cg24851651    20.5895134
## 134 20.33617165 7.329384e+00 12.9433597 cg21697769    20.3361716
## 135 20.32987702 6.214241e+00 14.0522083 cg04664583    20.3298770
## 136 14.63862353 1.993277e+01  5.2307158 cg00999469    19.9327674
## 137  2.27018549 1.742740e+01 19.7610183 cg20370184    19.7610183
## 138 18.97878439 4.183644e+00 14.7317123 cg11187460    18.9787844
## 139 18.43650302 1.998897e+00 16.3741776 cg12784167    18.4365030
## 140  1.20148356 1.698306e+01 18.2479666 cg02320265    18.2479666
## 141 17.49071043 1.357646e+01  3.8508273 cg12776173    17.4907104
## 142 17.27589672 1.271058e+00 15.9414108 cg08779649    17.2758967
## 143  8.18262162 8.988192e+00 17.2342417 cg01933473    17.2342417
## 144 17.18150883 8.948544e+00  8.1695367 cg15501526    17.1815088
## 145 13.77131645 1.693226e+01  3.0975134 cg10985055    16.9322578
## 146 16.16400549 6.749876e+00  9.3507013 cg17906851    16.1640055
## 147 11.29843847 4.708384e+00 16.0702505 cg14307563    16.0702505
## 148  4.33311418 1.431148e+01  9.9149370 cg16788319    14.3114792
## 149 11.34762178 1.384129e+01  2.4302450 cg24506579    13.8412948
## 150  9.52142351 1.242079e+01  2.8359369 cg27639199    12.4207884
## 151  1.91220728 1.029544e+01 12.2710740 cg12466610    12.2710740
## 152  9.00483857 2.188811e+00 11.2570774 cg15633912    11.2570774
## 153  0.00000000 1.116831e+01 11.2317426 cg01413796    11.2317426
## 154  1.45779360 1.885462e-01  1.7097678 cg01549082     1.7097678
## 155  0.70732781 5.928875e-03  0.7766847    age.now     0.7766847
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_elastic_net_model1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.

if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_elastic_net_model1_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_elastic_net_model1_df,n=20)$Feature)
  
  importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_elastic_net_model1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
##          CN   Dementia       MCI    Feature MaxImportance
## 1  86.62051 100.000000 13.316063        PC1     100.00000
## 2  68.41217  88.605405 20.129808        PC2      88.60540
## 3  72.96575  12.358856 60.543468 cg00962106      72.96575
## 4  43.13323  18.828200 62.024857 cg02225060      62.02486
## 5  49.96585   8.974481 59.003760 cg02981548      59.00376
## 6  57.29080  15.760055 41.467318 cg23432430      57.29080
## 7  54.50229   8.362895 46.075964 cg14710850      54.50229
## 8  20.67689  33.683872 54.424186 cg16749614      54.42419
## 9  48.28879  54.289484  5.937268 cg07152869      54.28948
## 10 28.99952  24.415226 53.478173 cg08857872      53.47817
## 11 27.03639  25.380542 52.480360 cg16652920      52.48036
## 12 51.16229  42.092747  9.006116 cg26948066      51.16229
## 13 12.10911  38.678898 50.851432        PC3      50.85143
## 14 48.60433   1.032191 49.699950 cg08861434      49.69995
## 15 49.49969  29.761647 19.674612 cg27452255      49.49969
## 16 48.11340  20.546195 27.503777 cg09584650      48.11340
## 17 31.91638  15.801563 47.781374 cg11133939      47.78137
## 18 47.24093  44.925673  2.251826 cg19503462      47.24093
## 19 20.56659  46.478583 25.848563 cg06864789      46.47858
## 20 30.73957  14.683630 45.486630 cg02372404      45.48663
## [1] "the top 20 features based on max way:"
##  [1] "PC1"        "PC2"        "cg00962106" "cg02225060" "cg02981548" "cg23432430" "cg14710850"
##  [8] "cg16749614" "cg07152869" "cg08857872" "cg16652920" "cg26948066" "PC3"        "cg08861434"
## [15] "cg27452255" "cg09584650" "cg11133939" "cg19503462" "cg06864789" "cg02372404"
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.

if(METHOD_FEATURE_FLAG == 5){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Freq_ENM1_AUC<-auc_value

  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4||METHOD_FEATURE_FLAG==6){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Freq_ENM1_AUC<-auc_value

  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 3){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Freq_ENM1_AUC<-auc_value

  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if (METHOD_FEATURE_FLAG ==1){
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.8682
## The AUC value for class CN is: 0.8681699 
## 
## Class: Dementia 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.8656
## The AUC value for class Dementia is: 0.8655844 
## 
## Class: MCI 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.8361
## The AUC value for class MCI is: 0.8361272

if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Freq_ENM1_AUC<-mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.8566272
print(FeatEval_Freq_ENM1_AUC)
## [1] 0.8566272

9.3.4. XGBoost

9.3.4.1 XGBoost Model Training

library(caret)
library(xgboost)
library(dplyr)
library(doParallel)
numCores <- detectCores() - 1
c2 <- makeCluster(numCores)
registerDoParallel(c2)
df_XGB1<-processed_data 
featureName_XGB1<-AfterProcess_FeatureName
set.seed(123)
trainIndex <- createDataPartition(df_XGB1$DX, p = 0.7, list = FALSE)
trainData_XGB1<- df_XGB1[trainIndex, ]
testData_XGB1 <- df_XGB1[-trainIndex, ]
cv_control <- trainControl(method = "cv", number = 5, allowParallel = TRUE)

xgb_model <- caret::train(
  DX ~ ., data = trainData_XGB1,
  method = "xgbTree", trControl = cv_control,
  metric = "Accuracy"
)

print(xgb_model)
## eXtreme Gradient Boosting 
## 
## 455 samples
## 155 predictors
##   3 classes: 'CN', 'Dementia', 'MCI' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 364, 365, 363, 364, 364 
## Resampling results across tuning parameters:
## 
##   eta  max_depth  colsample_bytree  subsample  nrounds  Accuracy   Kappa    
##   0.3  1          0.6               0.50        50      0.5626390  0.2014997
##   0.3  1          0.6               0.50       100      0.5624696  0.2158945
##   0.3  1          0.6               0.50       150      0.5469873  0.1948979
##   0.3  1          0.6               0.75        50      0.5274964  0.1270203
##   0.3  1          0.6               0.75       100      0.5715756  0.2190828
##   0.3  1          0.6               0.75       150      0.5693539  0.2286734
##   0.3  1          0.6               1.00        50      0.5210023  0.1059047
##   0.3  1          0.6               1.00       100      0.5452243  0.1644825
##   0.3  1          0.6               1.00       150      0.5540405  0.1910353
##   0.3  1          0.8               0.50        50      0.5648851  0.1998096
##   0.3  1          0.8               0.50       100      0.5999310  0.2842385
##   0.3  1          0.8               0.50       150      0.5956554  0.2794328
##   0.3  1          0.8               0.75        50      0.5430520  0.1516979
##   0.3  1          0.8               0.75       100      0.5649578  0.2061207
##   0.3  1          0.8               0.75       150      0.5605372  0.2043284
##   0.3  1          0.8               1.00        50      0.5232468  0.1074828
##   0.3  1          0.8               1.00       100      0.5474954  0.1693409
##   0.3  1          0.8               1.00       150      0.5737490  0.2274695
##   0.3  2          0.6               0.50        50      0.5407788  0.1714629
##   0.3  2          0.6               0.50       100      0.5407554  0.1718381
##   0.3  2          0.6               0.50       150      0.5495472  0.1892652
##   0.3  2          0.6               0.75        50      0.5648134  0.2046993
##   0.3  2          0.6               0.75       100      0.5913569  0.2486865
##   0.3  2          0.6               0.75       150      0.5847879  0.2443570
##   0.3  2          0.6               1.00        50      0.5363853  0.1443827
##   0.3  2          0.6               1.00       100      0.5628327  0.1953351
##   0.3  2          0.6               1.00       150      0.5826623  0.2366868
##   0.3  2          0.8               0.50        50      0.5890614  0.2501737
##   0.3  2          0.8               0.50       100      0.5824935  0.2452226
##   0.3  2          0.8               0.50       150      0.5912842  0.2616974
##   0.3  2          0.8               0.75        50      0.5692573  0.1988720
##   0.3  2          0.8               0.75       100      0.5693051  0.2036385
##   0.3  2          0.8               0.75       150      0.5736529  0.2183813
##   0.3  2          0.8               1.00        50      0.5386065  0.1523143
##   0.3  2          0.8               1.00       100      0.5386542  0.1511371
##   0.3  2          0.8               1.00       150      0.5694261  0.2109708
##   0.3  3          0.6               0.50        50      0.5668912  0.2142216
##   0.3  3          0.6               0.50       100      0.5999320  0.2739081
##   0.3  3          0.6               0.50       150      0.6000287  0.2767759
##   0.3  3          0.6               0.75        50      0.5827101  0.2269304
##   0.3  3          0.6               0.75       100      0.5893035  0.2437535
##   0.3  3          0.6               0.75       150      0.5870569  0.2464742
##   0.3  3          0.6               1.00        50      0.5626390  0.1938117
##   0.3  3          0.6               1.00       100      0.5603440  0.1981710
##   0.3  3          0.6               1.00       150      0.5648373  0.2066440
##   0.3  3          0.8               0.50        50      0.5651516  0.2022861
##   0.3  3          0.8               0.50       100      0.5825174  0.2372226
##   0.3  3          0.8               0.50       150      0.5913086  0.2593325
##   0.3  3          0.8               0.75        50      0.5735802  0.2132218
##   0.3  3          0.8               0.75       100      0.5779763  0.2221561
##   0.3  3          0.8               0.75       150      0.5890136  0.2432604
##   0.3  3          0.8               1.00        50      0.5627600  0.1925487
##   0.3  3          0.8               1.00       100      0.5626140  0.1938372
##   0.3  3          0.8               1.00       150      0.5648851  0.2019155
##   0.4  1          0.6               0.50        50      0.5518193  0.1871534
##   0.4  1          0.6               0.50       100      0.5606355  0.2125577
##   0.4  1          0.6               0.50       150      0.5759473  0.2497633
##   0.4  1          0.6               0.75        50      0.5407804  0.1628039
##   0.4  1          0.6               0.75       100      0.5496443  0.1941811
##   0.4  1          0.6               0.75       150      0.5805118  0.2549604
##   0.4  1          0.6               1.00        50      0.5474237  0.1603859
##   0.4  1          0.6               1.00       100      0.5628078  0.2050572
##   0.4  1          0.6               1.00       150      0.5693284  0.2210329
##   0.4  1          0.8               0.50        50      0.5497409  0.1949540
##   0.4  1          0.8               0.50       100      0.5542093  0.2109399
##   0.4  1          0.8               0.50       150      0.5694739  0.2409194
##   0.4  1          0.8               0.75        50      0.5607315  0.1935090
##   0.4  1          0.8               0.75       100      0.5605855  0.2048830
##   0.4  1          0.8               0.75       150      0.5716728  0.2312368
##   0.4  1          0.8               1.00        50      0.5430514  0.1517067
##   0.4  1          0.8               1.00       100      0.5694012  0.2179684
##   0.4  1          0.8               1.00       150      0.5781929  0.2415106
##   0.4  2          0.6               0.50        50      0.5627106  0.2286323
##   0.4  2          0.6               0.50       100      0.5474226  0.1999111
##   0.4  2          0.6               0.50       150      0.5561671  0.2123446
##   0.4  2          0.6               0.75        50      0.5604656  0.2094602
##   0.4  2          0.6               0.75       100      0.5628078  0.2120169
##   0.4  2          0.6               0.75       150      0.5628333  0.2107193
##   0.4  2          0.6               1.00        50      0.5627350  0.2019805
##   0.4  2          0.6               1.00       100      0.5912831  0.2535422
##   0.4  2          0.6               1.00       150      0.5693529  0.2161640
##   0.4  2          0.8               0.50        50      0.5736035  0.2274106
##   0.4  2          0.8               0.50       100      0.5781680  0.2491219
##   0.4  2          0.8               0.50       150      0.5781202  0.2492017
##   0.4  2          0.8               0.75        50      0.5563115  0.1918094
##   0.4  2          0.8               0.75       100      0.5671556  0.2152489
##   0.4  2          0.8               0.75       150      0.5649573  0.2131570
##   0.4  2          0.8               1.00        50      0.5670101  0.2060126
##   0.4  2          0.8               1.00       100      0.5846902  0.2403611
##   0.4  2          0.8               1.00       150      0.5826129  0.2415605
##   0.4  3          0.6               0.50        50      0.5934103  0.2544464
##   0.4  3          0.6               0.50       100      0.5714795  0.2209056
##   0.4  3          0.6               0.50       150      0.5803196  0.2390760
##   0.4  3          0.6               0.75        50      0.5826368  0.2398459
##   0.4  3          0.6               0.75       100      0.5759457  0.2286999
##   0.4  3          0.6               0.75       150      0.5914286  0.2583358
##   0.4  3          0.6               1.00        50      0.5825434  0.2358005
##   0.4  3          0.6               1.00       100      0.5870574  0.2455986
##   0.4  3          0.6               1.00       150      0.5848357  0.2434322
##   0.4  3          0.8               0.50        50      0.5693773  0.2196675
##   0.4  3          0.8               0.50       100      0.5694256  0.2247098
##   0.4  3          0.8               0.50       150      0.5606339  0.2137666
##   0.4  3          0.8               0.75        50      0.5648123  0.1993573
##   0.4  3          0.8               0.75       100      0.5779753  0.2355511
##   0.4  3          0.8               0.75       150      0.5736035  0.2285789
##   0.4  3          0.8               1.00        50      0.5870584  0.2364816
##   0.4  3          0.8               1.00       100      0.5671795  0.2040362
##   0.4  3          0.8               1.00       150      0.5803907  0.2318105
## 
## Tuning parameter 'gamma' was held constant at a value of 0
## Tuning parameter
##  'min_child_weight' was held constant at a value of 1
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were nrounds = 150, max_depth = 3, eta = 0.3, gamma =
##  0, colsample_bytree = 0.6, min_child_weight = 1 and subsample = 0.5.
mean_accuracy_xgb_model<- mean(xgb_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_xgb_model)
## [1] 0.5677591
FeatEval_Freq_mean_accuracy_cv_xgb<-mean_accuracy_xgb_model
print(FeatEval_Freq_mean_accuracy_cv_xgb)
## [1] 0.5677591
train_predictions <- predict(xgb_model, newdata = trainData_XGB1, type = "raw")

train_accuracy <- mean(train_predictions == trainData_XGB1$DX)
FeatEval_Freq_xgb_trainAccuracy <- train_accuracy

print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  1"
print(FeatEval_Freq_xgb_trainAccuracy)
## [1] 1
predictions <- predict(xgb_model, newdata = testData_XGB1)
cm_FeatEval_Freq_xgb <-caret::confusionMatrix(predictions,testData_XGB1$DX)
print(cm_FeatEval_Freq_xgb)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia MCI
##   CN       38        6  15
##   Dementia  1        5   1
##   MCI      27       17  83
## 
## Overall Statistics
##                                           
##                Accuracy : 0.6528          
##                  95% CI : (0.5811, 0.7198)
##     No Information Rate : 0.513           
##     P-Value [Acc > NIR] : 5.951e-05       
##                                           
##                   Kappa : 0.3719          
##                                           
##  Mcnemar's Test P-Value : 9.466e-05       
## 
## Statistics by Class:
## 
##                      Class: CN Class: Dementia Class: MCI
## Sensitivity             0.5758         0.17857     0.8384
## Specificity             0.8346         0.98788     0.5319
## Pos Pred Value          0.6441         0.71429     0.6535
## Neg Pred Value          0.7910         0.87634     0.7576
## Prevalence              0.3420         0.14508     0.5130
## Detection Rate          0.1969         0.02591     0.4301
## Detection Prevalence    0.3057         0.03627     0.6580
## Balanced Accuracy       0.7052         0.58323     0.6851
cm_FeatEval_Freq_xgb_Accuracy <-cm_FeatEval_Freq_xgb$overall["Accuracy"]
cm_FeatEval_Freq_xgb_Kappa <-cm_FeatEval_Freq_xgb$overall["Kappa"]

print(cm_FeatEval_Freq_xgb_Accuracy)
##  Accuracy 
## 0.6528497
print(cm_FeatEval_Freq_xgb_Kappa)
##     Kappa 
## 0.3718547
importance_xgb_model<- varImp(xgb_model)

print(importance_xgb_model)
## xgbTree variable importance
## 
##   only 20 most important variables shown (out of 155)
## 
##            Overall
## age.now     100.00
## cg08857872   90.21
## cg00962106   86.79
## cg14564293   80.36
## cg15501526   79.95
## cg02225060   78.88
## cg00154902   77.96
## cg02621446   69.80
## cg25259265   63.43
## cg05096415   61.27
## cg01013522   59.75
## cg16771215   58.96
## cg05234269   58.59
## cg02981548   55.91
## cg26948066   55.78
## cg00696044   55.74
## cg04248279   54.60
## cg17186592   53.42
## cg01933473   50.41
## cg01153376   49.92
plot(importance_xgb_model, top = 20, main = "Variable Importance Plot")

importance_xgb_model_df<-importance_xgb_model$importance
importance <- xgb.importance(model = xgb_model$finalModel)
xgb.plot.importance(importance_matrix = importance)

ordered_importance <- importance[order(-importance$Importance), ]
print(ordered_importance)
##         Feature         Gain        Cover   Frequency   Importance
##          <char>        <num>        <num>       <num>        <num>
##   1:    age.now 0.0206450994 0.0258977236 0.012645422 0.0206450994
##   2: cg08857872 0.0186438510 0.0142802565 0.010116338 0.0186438510
##   3: cg00962106 0.0179437506 0.0144742747 0.017197774 0.0179437506
##   4: cg14564293 0.0166285957 0.0163139863 0.009104704 0.0166285957
##   5: cg15501526 0.0165441027 0.0120930070 0.008598887 0.0165441027
##  ---                                                              
## 151: cg20507276 0.0011166602 0.0026583831 0.005058169 0.0011166602
## 152: cg27577781 0.0010477016 0.0024262331 0.003540718 0.0010477016
## 153: cg10750306 0.0009064883 0.0014152294 0.004046535 0.0009064883
## 154: cg13080267 0.0008719851 0.0012592102 0.003540718 0.0008719851
## 155: cg04664583 0.0001936603 0.0006596944 0.002529084 0.0001936603
stopCluster(c2)
registerDoSEQ()
if(METHOD_FEATURE_FLAG == 5){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Freq_xgb_AUC <- auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Freq_xgb_AUC <- auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Freq_xgb_AUC <- auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.7814
## The AUC value for class CN is: 0.7814364 
## 
## Class: Dementia 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.687
## The AUC value for class Dementia is: 0.687013 
## 
## Class: MCI 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.7393
## The AUC value for class MCI is: 0.739308

if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Freq_xgb_AUC <- mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.7359191
print(FeatEval_Freq_xgb_AUC)
## [1] 0.7359191

9.3.5. Random Forest

9.3.5.1 Random Forest Model Training

library(caret)
library(randomForest)
df_RFM1<-processed_data 
featureName_RFM1<-AfterProcess_FeatureName
library(randomForest)

set.seed(123) 
trainIndex <- createDataPartition(df_RFM1$DX, p = 0.7, list = FALSE)
train_data_RFM1 <- df_RFM1[trainIndex, ]
test_data_RFM1 <- df_RFM1[-trainIndex, ]

X_train_RFM1 <- subset(train_data_RFM1, select = -DX)
y_train_RFM1 <- train_data_RFM1$DX
X_train_RFM1 <- subset(test_data_RFM1, select = -DX)
y_test_RFM1 <- test_data_RFM1$DX
ctrl <- trainControl(method = "cv", number = 5, classProbs = TRUE)

rf_model <- caret::train(
  DX ~ ., data = train_data_RFM1,
  method = "rf", trControl = ctrl,
  metric = "Accuracy",
  importance = TRUE
)

print(rf_model)
## Random Forest 
## 
## 455 samples
## 155 predictors
##   3 classes: 'CN', 'Dementia', 'MCI' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 364, 365, 363, 364, 364 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa     
##     2   0.5253246  0.02762106
##    78   0.5560471  0.13321329
##   155   0.5604916  0.14054833
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 155.
mean_accuracy_rf_model<- mean(rf_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_rf_model)
## [1] 0.5472878
FeatEval_Freq_mean_accuracy_cv_rf<-mean_accuracy_rf_model
print(FeatEval_Freq_mean_accuracy_cv_rf)
## [1] 0.5472878
train_predictions <- predict(rf_model, newdata = train_data_RFM1, type = "raw")


train_accuracy <- mean(train_predictions == train_data_RFM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  1"
FeatEval_Freq_rf_trainAccuracy<-train_accuracy
print(FeatEval_Freq_rf_trainAccuracy)
## [1] 1
predictions <- predict(rf_model, newdata = test_data_RFM1)
cm_FeatEval_Freq_rf<-caret::confusionMatrix(predictions,test_data_RFM1$DX)
print(cm_FeatEval_Freq_rf)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia MCI
##   CN       17        7  10
##   Dementia  0        0   0
##   MCI      49       21  89
## 
## Overall Statistics
##                                           
##                Accuracy : 0.5492          
##                  95% CI : (0.4761, 0.6208)
##     No Information Rate : 0.513           
##     P-Value [Acc > NIR] : 0.1747          
##                                           
##                   Kappa : 0.1284          
##                                           
##  Mcnemar's Test P-Value : 1.25e-11        
## 
## Statistics by Class:
## 
##                      Class: CN Class: Dementia Class: MCI
## Sensitivity            0.25758          0.0000     0.8990
## Specificity            0.86614          1.0000     0.2553
## Pos Pred Value         0.50000             NaN     0.5597
## Neg Pred Value         0.69182          0.8549     0.7059
## Prevalence             0.34197          0.1451     0.5130
## Detection Rate         0.08808          0.0000     0.4611
## Detection Prevalence   0.17617          0.0000     0.8238
## Balanced Accuracy      0.56186          0.5000     0.5772
cm_FeatEval_Freq_rf_Accuracy<-cm_FeatEval_Freq_rf$overall["Accuracy"]
print(cm_FeatEval_Freq_rf_Accuracy)
##  Accuracy 
## 0.5492228
cm_FeatEval_Freq_rf_Kappa<-cm_FeatEval_Freq_rf$overall["Kappa"]
print(cm_FeatEval_Freq_rf_Kappa)
##     Kappa 
## 0.1283742
importance_rf_model <- varImp(rf_model)

print(importance_rf_model)
## rf variable importance
## 
##   variables are sorted by maximum importance across the classes
##   only 20 most important variables shown (out of 155)
## 
##               CN Dementia    MCI
## cg15501526 76.56    12.67 100.00
## age.now    47.68    49.97  65.92
## cg08857872 28.33    35.28  60.89
## cg01153376 19.27    31.95  59.45
## cg04412904 47.79    29.66  30.59
## cg11331837 28.17    35.84  47.43
## cg12279734 25.04    47.29  24.29
## cg23658987 46.86    24.79  27.31
## cg10240127 45.23    13.20  26.12
## cg27086157 29.79    11.55  44.90
## cg02621446 44.82    29.81  42.28
## cg00154902 22.94    43.98  41.47
## cg24506579 25.43    43.76  19.06
## cg00689685 35.66    41.97  19.30
## cg08198851 34.44    13.61  41.38
## cg10738648 40.75    21.39  34.86
## cg03129555 40.53    21.07  13.68
## cg02320265 11.83    33.98  40.30
## cg12228670 39.21    16.46  40.15
## cg25259265 30.86    36.43  39.91
plot(importance_rf_model, top = 20, main = "Variable Importance Plot")

importance_rf_model_df<-importance_rf_model$importance
if( METHOD_FEATURE_FLAG==5){
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)
 
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(MCI))

print(Ordered_importance_rf_final_model)
  
}
if( METHOD_FEATURE_FLAG==4||METHOD_FEATURE_FLAG==6){
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)
 
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(Dementia))

print(Ordered_importance_rf_final_model)
  
}
if( METHOD_FEATURE_FLAG==3){
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)
 
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(CI))

print(Ordered_importance_rf_final_model)
  
}
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_rf_model_df$Feature<-rownames(importance_rf_model_df)
  importance_rf_model_df <- importance_rf_model_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_rf_model_df)
  
}
##            CN  Dementia        MCI    Feature MaxImportance
## 1   76.557522 12.671541 100.000000 cg15501526     100.00000
## 2   47.675219 49.970307  65.921767    age.now      65.92177
## 3   28.334482 35.277523  60.891135 cg08857872      60.89113
## 4   19.274188 31.947251  59.452175 cg01153376      59.45217
## 5   47.790469 29.664061  30.586808 cg04412904      47.79047
## 6   28.165143 35.837651  47.429820 cg11331837      47.42982
## 7   25.038881 47.287396  24.289313 cg12279734      47.28740
## 8   46.860287 24.788670  27.314640 cg23658987      46.86029
## 9   45.232383 13.199544  26.119442 cg10240127      45.23238
## 10  29.794620 11.547286  44.896868 cg27086157      44.89687
## 11  44.816028 29.809458  42.279680 cg02621446      44.81603
## 12  22.935623 43.981809  41.465775 cg00154902      43.98181
## 13  25.428106 43.759595  19.056274 cg24506579      43.75960
## 14  35.657086 41.969176  19.300855 cg00689685      41.96918
## 15  34.444474 13.611592  41.381808 cg08198851      41.38181
## 16  40.753056 21.386297  34.859047 cg10738648      40.75306
## 17  40.534123 21.074843  13.676503 cg03129555      40.53412
## 18  11.829463 33.976102  40.295951 cg02320265      40.29595
## 19  39.206322 16.461197  40.148590 cg12228670      40.14859
## 20  30.855437 36.426905  39.906142 cg25259265      39.90614
## 21  18.800224 22.642347  39.562003 cg14710850      39.56200
## 22  26.927646 27.912295  39.310410 cg01933473      39.31041
## 23  28.027166 13.440389  38.727770 cg12466610      38.72777
## 24  23.150847 38.719307  28.005562 cg19512141      38.71931
## 25  34.098424  6.404334  38.633630 cg01921484      38.63363
## 26   2.321973 38.581729  13.038975 cg14307563      38.58173
## 27  38.466357 27.599100  28.386039 cg24861747      38.46636
## 28  29.832180 30.004096  38.465164 cg00962106      38.46516
## 29  20.868241 32.188190  38.200852 cg23916408      38.20085
## 30  38.199229 35.109378  38.010718 cg02225060      38.19923
## 31  32.813248 29.587642  37.770671 cg00616572      37.77067
## 32  26.756275 18.379489  37.624229 cg16652920      37.62423
## 33   8.558735 22.036059  37.485375        PC3      37.48538
## 34  25.191758 16.064688  37.199630 cg06715136      37.19963
## 35   9.905268 37.006968  22.922249 cg12284872      37.00697
## 36  15.665099 25.743729  36.962434 cg05096415      36.96243
## 37  23.716251 36.829351  26.997109 cg18821122      36.82935
## 38  30.366998 36.804164  13.954492 cg19503462      36.80416
## 39  31.885237 24.914528  36.664723 cg15865722      36.66472
## 40  23.312543 27.609555  36.510004 cg07028768      36.51000
## 41   8.134870 28.139986  36.484573 cg17906851      36.48457
## 42  36.459120 34.507209  31.464193 cg00999469      36.45912
## 43  28.738331 25.224942  36.200164 cg12146221      36.20016
## 44   7.749586 36.143999  13.522490 cg06697310      36.14400
## 45  36.019111 20.194962  32.815123 cg14293999      36.01911
## 46  25.127638 35.585410  34.654922 cg02494911      35.58541
## 47  33.110587 30.573956  35.307728 cg01667144      35.30773
## 48  35.281334 33.110445  29.224294 cg26069044      35.28133
## 49  23.499631 28.306483  34.908205 cg10750306      34.90820
## 50  21.957287 23.375092  34.851217 cg23432430      34.85122
## 51  34.709456 33.075014  13.472489 cg14564293      34.70946
## 52  29.164687 18.215493  34.574848 cg24873924      34.57485
## 53  23.997078 26.213829  34.437667 cg26948066      34.43767
## 54  18.505626 28.989207  34.349722 cg20139683      34.34972
## 55  34.301912 22.967946   9.104703 cg03084184      34.30191
## 56  31.138920 34.231962  27.764958 cg05321907      34.23196
## 57   5.548237 34.122444  33.127313 cg27341708      34.12244
## 58  27.036024 34.096317  33.376387 cg06112204      34.09632
## 59  21.011794 34.040792  15.561797 cg13885788      34.04079
## 60  11.227817 34.009731  17.268730 cg00247094      34.00973
## 61  33.982197 25.390858  31.641726 cg12776173      33.98220
## 62  26.030267 32.529626  33.893198 cg02932958      33.89320
## 63  33.840561 15.297753  30.966868 cg03660162      33.84056
## 64  30.260520 31.239948  33.696576 cg06378561      33.69658
## 65  27.187461 33.624404  30.548059 cg05841700      33.62440
## 66  33.585063 26.814741  16.392102 cg01680303      33.58506
## 67  26.426384 17.826922  33.540052 cg12012426      33.54005
## 68  33.356186 21.491224  16.797766 cg11247378      33.35619
## 69  30.685059 33.265336  26.224587 cg12682323      33.26534
## 70  26.239028 33.210415  15.029442 cg06536614      33.21041
## 71  29.680946 33.107219  28.301134 cg02372404      33.10722
## 72  18.245164 12.269336  32.985202 cg12738248      32.98520
## 73  27.469747 32.951864  29.995839 cg01413796      32.95186
## 74  27.558199 32.904776  13.894549 cg27577781      32.90478
## 75  32.881946 20.071800  21.381645 cg16771215      32.88195
## 76  32.861097 20.964997  20.609993 cg02356645      32.86110
## 77  16.306166 16.125543  32.799640        PC2      32.79964
## 78  20.659967 32.531707  19.021606 cg18819889      32.53171
## 79  23.801364 23.299663  32.457994 cg00322003      32.45799
## 80  14.174655 32.385627  15.260737 cg14527649      32.38563
## 81  22.707934 24.855162  32.171025 cg27272246      32.17103
## 82  13.171397 32.143854  23.934534 cg16749614      32.14385
## 83  30.085994 14.457693  32.140575 cg16788319      32.14057
## 84  32.127299 11.985407  28.995139 cg10369879      32.12730
## 85  23.898218 18.617903  32.109373 cg10985055      32.10937
## 86  18.519748 28.788120  32.102136        PC1      32.10214
## 87  20.602486 31.893799  22.148440 cg26474732      31.89380
## 88  23.252282 20.310270  31.798601 cg04248279      31.79860
## 89  15.645024 16.207616  31.662536 cg25879395      31.66254
## 90  16.935190 31.553067  23.395061 cg00675157      31.55307
## 91  31.453159 14.451961  17.444123 cg11438323      31.45316
## 92  23.553026 27.943710  31.431104 cg12784167      31.43110
## 93  31.422718 26.545908  19.831244 cg03088219      31.42272
## 94  19.151605 31.397392  13.684714 cg25561557      31.39739
## 95  22.831593 31.354120  25.575486 cg26757229      31.35412
## 96  31.335151 16.982945  12.502967 cg01013522      31.33515
## 97  29.681319 31.297956  24.964984 cg12534577      31.29796
## 98  30.967252 22.276823  30.641848 cg15535896      30.96725
## 99  30.249127 30.788217  25.551552 cg21209485      30.78822
## 100 14.521028 23.525064  30.692510 cg25758034      30.69251
## 101 30.587745 29.449598  27.017735 cg02981548      30.58775
## 102 29.978427  8.324429  27.918327 cg20685672      29.97843
## 103 28.019829 17.529064  29.975951 cg07138269      29.97595
## 104 19.362432 29.800373  27.134778 cg00272795      29.80037
## 105 12.463063 22.450755  29.651498 cg03071582      29.65150
## 106  9.639006 25.747210  29.304954 cg06950937      29.30495
## 107 29.287502  9.277803  15.968740 cg07523188      29.28750
## 108 10.471845 29.219275  24.146277 cg17186592      29.21928
## 109 29.062834 26.923298  17.453515 cg08584917      29.06283
## 110 29.008156 26.881553  24.833864 cg15775217      29.00816
## 111 28.898041 16.542972  23.500265 cg11187460      28.89804
## 112 28.498243 19.588062  20.558468 cg13080267      28.49824
## 113 14.736947 21.014955  28.393912 cg00696044      28.39391
## 114 23.053910 22.307916  28.382166 cg03982462      28.38217
## 115 28.329989 21.987090  25.804974 cg14240646      28.32999
## 116 28.325767 16.993197  19.351713 cg03327352      28.32577
## 117 14.287716 28.315537  14.325393 cg24851651      28.31554
## 118 26.820839 28.231727  20.523253 cg20370184      28.23173
## 119 18.581261 27.901487  23.766773 cg15633912      27.90149
## 120 18.656208 18.177485  27.776474 cg27639199      27.77647
## 121 27.700469 12.118715  21.581429 cg08779649      27.70047
## 122 22.632023 27.608406  18.684827 cg07152869      27.60841
## 123 27.583688 18.351688  23.958175 cg17421046      27.58369
## 124 27.502771 16.371358  25.903356 cg01128042      27.50277
## 125 27.438427 13.564315  23.800666 cg09584650      27.43843
## 126 19.321020 27.252073  17.255848 cg14924512      27.25207
## 127 19.916538 18.011699  27.102551 cg05234269      27.10255
## 128 25.160604 26.955728  27.071172 cg19377607      27.07117
## 129 24.824570 23.641732  26.942593 cg11133939      26.94259
## 130 26.077300 12.871914  26.887496 cg05570109      26.88750
## 131 26.821967 21.020844  25.702543 cg24859648      26.82197
## 132 13.518090 14.804948  26.803240 cg11227702      26.80324
## 133 26.609214 18.554915  20.623618 cg04664583      26.60921
## 134 23.420930 10.009226  26.429006 cg21854924      26.42901
## 135 19.925757 26.274847  23.587711 cg20913114      26.27485
## 136 12.953184 15.432140  26.213690 cg17738613      26.21369
## 137 26.107548 19.892290  22.551946 cg09854620      26.10755
## 138 26.020994 24.606325  15.409204 cg25436480      26.02099
## 139 16.236454 21.542269  25.879241 cg08861434      25.87924
## 140 11.775460 21.020269  25.754330 cg21697769      25.75433
## 141 22.790673 12.087909  25.517484 cg20678988      25.51748
## 142 20.719592 25.424682  21.273081 cg27452255      25.42468
## 143 21.655604 25.406488  20.002807 cg20507276      25.40649
## 144 19.329855 25.024138  22.288192 cg23161429      25.02414
## 145 24.369803 24.606859  20.308675 cg01549082      24.60686
## 146 16.323001 22.498629  24.606185 cg16579946      24.60619
## 147 24.367117 22.586984  21.086705 cg22274273      24.36712
## 148 14.876059 23.719784  24.335876 cg06864789      24.33588
## 149 23.991716 15.416105   0.000000 cg18339359      23.99172
## 150 23.097285 23.910924  17.064792 cg26219488      23.91092
## 151 21.156788 23.766333  19.172032 cg16178271      23.76633
## 152  5.224382 21.759585  11.946569 cg07480176      21.75959
## 153 12.977742 19.689652  19.476222 cg16715186      19.68965
## 154 14.523868 15.847457  18.882385 cg06118351      18.88238
## 155 15.989563  9.329725  14.835184 cg17429539      15.98956
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_rf_model_df <- importance_rf_model_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_rf_model_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.

if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_rf_model_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_rf_model_df,n=20)$Feature)
  
  importance_melted_rf_model_df <- importance_rf_model_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_rf_model_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
##          CN Dementia       MCI    Feature MaxImportance
## 1  76.55752 12.67154 100.00000 cg15501526     100.00000
## 2  47.67522 49.97031  65.92177    age.now      65.92177
## 3  28.33448 35.27752  60.89113 cg08857872      60.89113
## 4  19.27419 31.94725  59.45217 cg01153376      59.45217
## 5  47.79047 29.66406  30.58681 cg04412904      47.79047
## 6  28.16514 35.83765  47.42982 cg11331837      47.42982
## 7  25.03888 47.28740  24.28931 cg12279734      47.28740
## 8  46.86029 24.78867  27.31464 cg23658987      46.86029
## 9  45.23238 13.19954  26.11944 cg10240127      45.23238
## 10 29.79462 11.54729  44.89687 cg27086157      44.89687
## 11 44.81603 29.80946  42.27968 cg02621446      44.81603
## 12 22.93562 43.98181  41.46577 cg00154902      43.98181
## 13 25.42811 43.75960  19.05627 cg24506579      43.75960
## 14 35.65709 41.96918  19.30086 cg00689685      41.96918
## 15 34.44447 13.61159  41.38181 cg08198851      41.38181
## 16 40.75306 21.38630  34.85905 cg10738648      40.75306
## 17 40.53412 21.07484  13.67650 cg03129555      40.53412
## 18 11.82946 33.97610  40.29595 cg02320265      40.29595
## 19 39.20632 16.46120  40.14859 cg12228670      40.14859
## 20 30.85544 36.42690  39.90614 cg25259265      39.90614
## [1] "the top 20 features based on max way:"
##  [1] "cg15501526" "age.now"    "cg08857872" "cg01153376" "cg04412904" "cg11331837" "cg12279734"
##  [8] "cg23658987" "cg10240127" "cg27086157" "cg02621446" "cg00154902" "cg24506579" "cg00689685"
## [15] "cg08198851" "cg10738648" "cg03129555" "cg02320265" "cg12228670" "cg25259265"
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.

if(METHOD_FEATURE_FLAG == 5){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")

  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc

  print(auc_value) 
  FeatEval_Freq_rf_AUC<-auc_value

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4||METHOD_FEATURE_FLAG==6){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")

  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc

  print(auc_value) 
  FeatEval_Freq_rf_AUC<-auc_value

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 3){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")

  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "CI"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc

  print(auc_value) 
  FeatEval_Freq_rf_AUC<-auc_value

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.6859
## The AUC value for class CN is: 0.6858745 
## 
## Class: Dementia 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.6633
## The AUC value for class Dementia is: 0.6633117 
## 
## Class: MCI 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.6451
## The AUC value for class MCI is: 0.6450677

if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Freq_rf_AUC<-mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.6647513
print(FeatEval_Freq_rf_AUC)
## [1] 0.6647513

9.3.6. SVM

9.3.6.1 SVM Model Training

df_SVM<-processed_data 
featureName_SVM1<-AfterProcess_FeatureName
trainIndex <- createDataPartition(df_SVM$DX, p = 0.7, list = FALSE)
train_data_SVM1 <- df_SVM[trainIndex, ]
test_data_SVM1 <- df_SVM[-trainIndex, ]

X_train_SVM1 <- subset(train_data_SVM1,select = -DX)
y_train_SVM1 <- train_data_SVM1$DX
X_test_SVM1 <- subset(test_data_SVM1, select= -DX )
y_test_SVM1 <- test_data_SVM1$DX
train_control <- trainControl(method = "cv", number = 5, classProbs = TRUE)

svm_model <- caret::train(DX ~ ., data = train_data_SVM1,
                   method = "svmRadial",
                   trControl = train_control)
print(svm_model)
## Support Vector Machines with Radial Basis Function Kernel 
## 
## 455 samples
## 155 predictors
##   3 classes: 'CN', 'Dementia', 'MCI' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 364, 364, 363, 364, 365 
## Resampling results across tuning parameters:
## 
##   C     Accuracy   Kappa    
##   0.25  0.6922456  0.4922408
##   0.50  0.7010368  0.5080797
##   1.00  0.7054080  0.5068655
## 
## Tuning parameter 'sigma' was held constant at a value of 0.003349724
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.003349724 and C = 1.
print(svm_model$bestTune)
##         sigma C
## 3 0.003349724 1
mean_accuracy_svm_model<- mean(svm_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_svm_model)
## [1] 0.6995634
FeatEval_Freq_mean_accuracy_cv_svm<-mean_accuracy_svm_model
print(FeatEval_Freq_mean_accuracy_cv_svm)
## [1] 0.6995634
train_predictions <- predict(svm_model, newdata = train_data_SVM1)

train_accuracy <- mean(train_predictions == train_data_SVM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  0.958241758241758"
FeatEval_Freq_svm_trainAccuracy <- train_accuracy
print(FeatEval_Freq_svm_trainAccuracy)
## [1] 0.9582418
predictions <- predict(svm_model, newdata = test_data_SVM1)

cm_FeatEval_Freq_svm<-caret::confusionMatrix(predictions,test_data_SVM1$DX)
print(cm_FeatEval_Freq_svm)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction CN Dementia MCI
##   CN       42        3  14
##   Dementia  5       18   4
##   MCI      19        7  81
## 
## Overall Statistics
##                                           
##                Accuracy : 0.7306          
##                  95% CI : (0.6621, 0.7918)
##     No Information Rate : 0.513           
##     P-Value [Acc > NIR] : 5.406e-10       
##                                           
##                   Kappa : 0.5439          
##                                           
##  Mcnemar's Test P-Value : 0.5568          
## 
## Statistics by Class:
## 
##                      Class: CN Class: Dementia Class: MCI
## Sensitivity             0.6364         0.64286     0.8182
## Specificity             0.8661         0.94545     0.7234
## Pos Pred Value          0.7119         0.66667     0.7570
## Neg Pred Value          0.8209         0.93976     0.7907
## Prevalence              0.3420         0.14508     0.5130
## Detection Rate          0.2176         0.09326     0.4197
## Detection Prevalence    0.3057         0.13990     0.5544
## Balanced Accuracy       0.7513         0.79416     0.7708
cm_FeatEval_Freq_svm_Accuracy <- cm_FeatEval_Freq_svm$overall["Accuracy"]
cm_FeatEval_Freq_svm_Kappa <- cm_FeatEval_Freq_svm$overall["Kappa"]
print(cm_FeatEval_Freq_svm_Accuracy)
##  Accuracy 
## 0.7305699
print(cm_FeatEval_Freq_svm_Kappa)
##     Kappa 
## 0.5439426

Let’s take a look of the feature importance of the model trained.

library(iml)
predictor_SVM <- Predictor$new(svm_model,data = df_SVM,y=df_SVM$DX)
importance_SVM <- FeatureImp$new(predictor_SVM,loss="ce")
print(importance_SVM)
## Interpretation method:  FeatureImp 
## error function: ce
## 
## Analysed predictor: 
## Prediction task: classification 
## Classes:  
## 
## Analysed data:
## Sampling from data.frame with 648 rows and 156 columns.
## 
## 
## Head of results:
##      feature importance.05 importance importance.95 permutation.error
## 1 cg08861434      1.090141   1.140845      1.225352         0.1250000
## 2 cg10240127      1.087324   1.112676      1.112676         0.1219136
## 3 cg16579946      1.047887   1.112676      1.112676         0.1219136
## 4 cg25879395      1.084507   1.112676      1.126761         0.1219136
## 5 cg02225060      1.061972   1.098592      1.138028         0.1203704
## 6 cg00962106      1.000000   1.084507      1.095775         0.1188272
plot(importance_SVM)

library(vip)
vip(svm_model, method = "permute", train = train_data_SVM1, target = "DX", nsim = 10, metric = "bal_accuracy", pred_wrapper = predict)

importance_SVM_df<-importance_SVM$results
if(METHOD_FEATURE_FLAG == 5){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc

  print(auc_value) 
  FeatEval_Freq_svm_AUC<-auc_value

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4||METHOD_FEATURE_FLAG==6){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc

  print(auc_value) 
  FeatEval_Freq_svm_AUC<-auc_value

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 3){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc

  print(auc_value) 
  FeatEval_Freq_svm_AUC<-auc_value

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.5208
## The AUC value for class CN is: 0.5207588 
## 
## Class: Dementia 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.532
## The AUC value for class Dementia is: 0.5320346 
## 
## Class: MCI 
## 
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[,     class])
## 
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.5214
## The AUC value for class MCI is: 0.5213841

if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Freq_svm_AUC<-mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.5247258
print(FeatEval_Freq_svm_AUC)
## [1] 0.5247258

10. Peromance Metrics

In the INPUT Session, “Metrics_Table_Output_FLAG” : This is the flag of output the metrics of this file, include model training stage metrics , key features selected based on mean Performance metrics, key feature selected based on median Performance metrics, key feature selected based on frequency Performance metrics

Feature_and_model_Metrics <- c("Training Accuracy", "Test Accuracy", "Test Kappa", "AUC", "Average Test Accuracy during Cross Validation")

ModelTrain_stage_Logistic_metrics_ModelTrainStage <- c(modelTrain_LRM1_trainAccuracy, cm_modelTrain_LRM1_Accuracy, cm_modelTrain_LRM1_Kappa,modelTrain_LRM1_AUC, modelTrain_mean_accuracy_cv_LRM1) 

ModelTrain_stage_Logistic_metrics_Feature_Mean<-c(FeatEval_Mean_LRM1_trainAccuracy,
cm_FeatEval_Mean_LRM1_Accuracy,cm_FeatEval_Mean_LRM1_Kappa,FeatEval_Mean_LRM1_AUC, FeatEval_Mean_mean_accuracy_cv_LRM1)

ModelTrain_stage_Logistic_metrics_Feature_Median<-c(FeatEval_Median_LRM1_trainAccuracy,
cm_FeatEval_Median_LRM1_Accuracy,cm_FeatEval_Median_LRM1_Kappa,FeatEval_Median_LRM1_AUC, FeatEval_Median_mean_accuracy_cv_LRM1)

ModelTrain_stage_Logistic_metrics_Feature_Freq<-c(FeatEval_Freq_LRM1_trainAccuracy,
cm_FeatEval_Freq_LRM1_Accuracy,cm_FeatEval_Freq_LRM1_Kappa,FeatEval_Freq_LRM1_AUC,FeatEval_Freq_mean_accuracy_cv_LRM1)

ModelTrain_stage_Logistic_metrics<-c(ModelTrain_stage_Logistic_metrics_ModelTrainStage, ModelTrain_stage_Logistic_metrics_Feature_Mean,ModelTrain_stage_Logistic_metrics_Feature_Median,ModelTrain_stage_Logistic_metrics_Feature_Freq)
ModelTrain_stage_ElasticNet_metrics_ModelTrainStage <- c(modelTrain_ENM1_trainAccuracy, cm_modelTrain_ENM1_Accuracy, cm_modelTrain_ENM1_Kappa,modelTrain_ENM1_AUC, modelTrain_mean_accuracy_cv_ENM1) 

ModelTrain_stage_ElasticNet_metrics_Feature_Mean<-c(FeatEval_Mean_ENM1_trainAccuracy,
cm_FeatEval_Mean_ENM1_Accuracy,cm_FeatEval_Mean_ENM1_Kappa,FeatEval_Mean_ENM1_AUC, FeatEval_Mean_mean_accuracy_cv_ENM1)

ModelTrain_stage_ElasticNet_metrics_Feature_Median<-c(FeatEval_Median_ENM1_trainAccuracy,
cm_FeatEval_Median_ENM1_Accuracy,cm_FeatEval_Median_ENM1_Kappa,FeatEval_Median_ENM1_AUC, FeatEval_Median_mean_accuracy_cv_ENM1)

ModelTrain_stage_ElasticNet_metrics_Feature_Freq<-c(FeatEval_Freq_ENM1_trainAccuracy,
cm_FeatEval_Freq_ENM1_Accuracy,cm_FeatEval_Freq_ENM1_Kappa,FeatEval_Freq_ENM1_AUC,FeatEval_Freq_mean_accuracy_cv_ENM1)

ModelTrain_stage_ElasticNet_metrics<-c(ModelTrain_stage_ElasticNet_metrics_ModelTrainStage, ModelTrain_stage_ElasticNet_metrics_Feature_Mean,ModelTrain_stage_ElasticNet_metrics_Feature_Median,ModelTrain_stage_ElasticNet_metrics_Feature_Freq)
ModelTrain_stage_XGBoost_metrics_ModelTrainStage <- c(modelTrain_xgb_trainAccuracy, cm_modelTrain_xgb_Accuracy, cm_modelTrain_xgb_Kappa,modelTrain_xgb_AUC, modelTrain_mean_accuracy_cv_xgb) 

ModelTrain_stage_XGBoost_metrics_Feature_Mean<-c(FeatEval_Mean_xgb_trainAccuracy,
cm_FeatEval_Mean_xgb_Accuracy,cm_FeatEval_Mean_xgb_Kappa,FeatEval_Mean_xgb_AUC, FeatEval_Mean_mean_accuracy_cv_xgb)

ModelTrain_stage_XGBoost_metrics_Feature_Median<-c(FeatEval_Median_xgb_trainAccuracy,
cm_FeatEval_Median_xgb_Accuracy,cm_FeatEval_Median_xgb_Kappa,FeatEval_Median_xgb_AUC, FeatEval_Median_mean_accuracy_cv_xgb)

ModelTrain_stage_XGBoost_metrics_Feature_Freq<-c(FeatEval_Freq_xgb_trainAccuracy,
cm_FeatEval_Freq_xgb_Accuracy,cm_FeatEval_Freq_xgb_Kappa,FeatEval_Freq_xgb_AUC,FeatEval_Freq_mean_accuracy_cv_xgb)

ModelTrain_stage_XGBoost_metrics<-c(ModelTrain_stage_XGBoost_metrics_ModelTrainStage, ModelTrain_stage_XGBoost_metrics_Feature_Mean,ModelTrain_stage_XGBoost_metrics_Feature_Median,ModelTrain_stage_XGBoost_metrics_Feature_Freq)
ModelTrain_stage_RandomForest_metrics_ModelTrainStage <- c(modelTrain_rf_trainAccuracy, cm_modelTrain_rf_Accuracy, cm_modelTrain_rf_Kappa,modelTrain_rf_AUC, modelTrain_mean_accuracy_cv_rf) 

ModelTrain_stage_RandomForest_metrics_Feature_Mean<-c(FeatEval_Mean_rf_trainAccuracy,
cm_FeatEval_Mean_rf_Accuracy,cm_FeatEval_Mean_rf_Kappa,FeatEval_Mean_rf_AUC, FeatEval_Mean_mean_accuracy_cv_rf)

ModelTrain_stage_RandomForest_metrics_Feature_Median<-c(FeatEval_Median_rf_trainAccuracy,
cm_FeatEval_Median_rf_Accuracy,cm_FeatEval_Median_rf_Kappa,FeatEval_Median_rf_AUC, FeatEval_Median_mean_accuracy_cv_rf)

ModelTrain_stage_RandomForest_metrics_Feature_Freq<-c(FeatEval_Freq_rf_trainAccuracy,
cm_FeatEval_Freq_rf_Accuracy,cm_FeatEval_Freq_rf_Kappa,FeatEval_Freq_rf_AUC,FeatEval_Freq_mean_accuracy_cv_rf)

ModelTrain_stage_RandomForest_metrics<-c(ModelTrain_stage_RandomForest_metrics_ModelTrainStage, ModelTrain_stage_RandomForest_metrics_Feature_Mean,ModelTrain_stage_RandomForest_metrics_Feature_Median,ModelTrain_stage_RandomForest_metrics_Feature_Freq)
ModelTrain_stage_SVM_metrics_ModelTrainStage <- c(modelTrain_svm_trainAccuracy, cm_modelTrain_svm_Accuracy, cm_modelTrain_svm_Kappa,modelTrain_svm_AUC, modelTrain_mean_accuracy_cv_svm) 

ModelTrain_stage_SVM_metrics_Feature_Mean<-c(FeatEval_Mean_svm_trainAccuracy,
cm_FeatEval_Mean_svm_Accuracy,cm_FeatEval_Mean_svm_Kappa,FeatEval_Mean_svm_AUC, FeatEval_Mean_mean_accuracy_cv_svm)

ModelTrain_stage_SVM_metrics_Feature_Median<-c(FeatEval_Median_svm_trainAccuracy,
cm_FeatEval_Median_svm_Accuracy,cm_FeatEval_Median_svm_Kappa,FeatEval_Median_svm_AUC, FeatEval_Median_mean_accuracy_cv_svm)

ModelTrain_stage_SVM_metrics_Feature_Freq<-c(FeatEval_Freq_svm_trainAccuracy,
cm_FeatEval_Freq_svm_Accuracy,cm_FeatEval_Freq_svm_Kappa,FeatEval_Freq_svm_AUC,FeatEval_Freq_mean_accuracy_cv_svm)

ModelTrain_stage_SVM_metrics<-c(ModelTrain_stage_SVM_metrics_ModelTrainStage, ModelTrain_stage_SVM_metrics_Feature_Mean,ModelTrain_stage_SVM_metrics_Feature_Median,ModelTrain_stage_SVM_metrics_Feature_Freq)
if(METHOD_FEATURE_FLAG==1){
  classifcationType = "Multiclass"
}
if(METHOD_FEATURE_FLAG==2){
  classifcationType = "Multiclass and use PCA"
}
if(METHOD_FEATURE_FLAG==3){
  classifcationType = "Binary"
}
if(METHOD_FEATURE_FLAG==4){
  classifcationType = "CN vs Dementia (AD)"
}
if(METHOD_FEATURE_FLAG==5){
  classifcationType = "CN vs MCI"
}
if(METHOD_FEATURE_FLAG==6){
  classifcationType = "MCI vs Dementia"
}
Metrics_results_df <- data.frame()

library(dplyr)

Metrics_results_df <- data.frame(
  `Number_of_CpG_used` = rep(Number_N_TopNCpGs, 20),
  `Number_of_Phenotype_Features_Used` = rep(5, 20),
  `Total_Number_of_features_before_Preprocessing` = rep(Number_N_TopNCpGs+5, 20),
  `Number_of_features_after_processing` = rep(Num_feaForProcess, 20),
  `Classification_Type` = rep(classifcationType, 20),
  `Number_of_Key_features_Selected_(Mean,Median)` = rep(INPUT_NUMBER_FEATURES, 20),
  `Number_of_Key_features_remained_based_on_frequency_methods` = rep(Num_KeyFea_Frequency, 20),
  `Metrics_Stage` = c(rep("Model Train Stage",5),rep("Key Feature Evaluation (Select based on Mean) ",5),rep("Key Feature Evaluation (Select based on Median) ",5),rep("Key Feature Evaluation (Select based on Frequency) ",5)),
  `Metric` = rep(Feature_and_model_Metrics, 4),
  `Logistic_regression` = c(ModelTrain_stage_Logistic_metrics),
  `Elastic_Net` = c(ModelTrain_stage_ElasticNet_metrics),
  `XGBoost` = c(ModelTrain_stage_XGBoost_metrics),
  `Random_Forest` = c(ModelTrain_stage_RandomForest_metrics),
  `SVM` = c(ModelTrain_stage_SVM_metrics)
)


print(Metrics_results_df)
##    Number_of_CpG_used Number_of_Phenotype_Features_Used
## 1                5000                                 5
## 2                5000                                 5
## 3                5000                                 5
## 4                5000                                 5
## 5                5000                                 5
## 6                5000                                 5
## 7                5000                                 5
## 8                5000                                 5
## 9                5000                                 5
## 10               5000                                 5
## 11               5000                                 5
## 12               5000                                 5
## 13               5000                                 5
## 14               5000                                 5
## 15               5000                                 5
## 16               5000                                 5
## 17               5000                                 5
## 18               5000                                 5
## 19               5000                                 5
## 20               5000                                 5
##    Total_Number_of_features_before_Preprocessing Number_of_features_after_processing
## 1                                           5005                                 155
## 2                                           5005                                 155
## 3                                           5005                                 155
## 4                                           5005                                 155
## 5                                           5005                                 155
## 6                                           5005                                 155
## 7                                           5005                                 155
## 8                                           5005                                 155
## 9                                           5005                                 155
## 10                                          5005                                 155
## 11                                          5005                                 155
## 12                                          5005                                 155
## 13                                          5005                                 155
## 14                                          5005                                 155
## 15                                          5005                                 155
## 16                                          5005                                 155
## 17                                          5005                                 155
## 18                                          5005                                 155
## 19                                          5005                                 155
## 20                                          5005                                 155
##    Classification_Type Number_of_Key_features_Selected_.Mean.Median.
## 1           Multiclass                                           250
## 2           Multiclass                                           250
## 3           Multiclass                                           250
## 4           Multiclass                                           250
## 5           Multiclass                                           250
## 6           Multiclass                                           250
## 7           Multiclass                                           250
## 8           Multiclass                                           250
## 9           Multiclass                                           250
## 10          Multiclass                                           250
## 11          Multiclass                                           250
## 12          Multiclass                                           250
## 13          Multiclass                                           250
## 14          Multiclass                                           250
## 15          Multiclass                                           250
## 16          Multiclass                                           250
## 17          Multiclass                                           250
## 18          Multiclass                                           250
## 19          Multiclass                                           250
## 20          Multiclass                                           250
##    Number_of_Key_features_remained_based_on_frequency_methods
## 1                                                         155
## 2                                                         155
## 3                                                         155
## 4                                                         155
## 5                                                         155
## 6                                                         155
## 7                                                         155
## 8                                                         155
## 9                                                         155
## 10                                                        155
## 11                                                        155
## 12                                                        155
## 13                                                        155
## 14                                                        155
## 15                                                        155
## 16                                                        155
## 17                                                        155
## 18                                                        155
## 19                                                        155
## 20                                                        155
##                                          Metrics_Stage
## 1                                    Model Train Stage
## 2                                    Model Train Stage
## 3                                    Model Train Stage
## 4                                    Model Train Stage
## 5                                    Model Train Stage
## 6       Key Feature Evaluation (Select based on Mean) 
## 7       Key Feature Evaluation (Select based on Mean) 
## 8       Key Feature Evaluation (Select based on Mean) 
## 9       Key Feature Evaluation (Select based on Mean) 
## 10      Key Feature Evaluation (Select based on Mean) 
## 11    Key Feature Evaluation (Select based on Median) 
## 12    Key Feature Evaluation (Select based on Median) 
## 13    Key Feature Evaluation (Select based on Median) 
## 14    Key Feature Evaluation (Select based on Median) 
## 15    Key Feature Evaluation (Select based on Median) 
## 16 Key Feature Evaluation (Select based on Frequency) 
## 17 Key Feature Evaluation (Select based on Frequency) 
## 18 Key Feature Evaluation (Select based on Frequency) 
## 19 Key Feature Evaluation (Select based on Frequency) 
## 20 Key Feature Evaluation (Select based on Frequency) 
##                                           Metric Logistic_regression Elastic_Net   XGBoost
## 1                              Training Accuracy           0.9604396   0.8637363 1.0000000
## 2                                  Test Accuracy           0.7098446   0.7202073 0.5854922
## 3                                     Test Kappa           0.4987013   0.4986772 0.2510671
## 4                                            AUC           0.8328700   0.8566272 0.7357863
## 5  Average Test Accuracy during Cross Validation           0.6331631   0.5868408 0.5686429
## 6                              Training Accuracy           0.9604396   0.8637363 1.0000000
## 7                                  Test Accuracy           0.7098446   0.7202073 0.6373057
## 8                                     Test Kappa           0.4987013   0.4986772 0.3435693
## 9                                            AUC           0.8329421   0.8566272 0.6960104
## 10 Average Test Accuracy during Cross Validation           0.6326693   0.5868952 0.5663579
## 11                             Training Accuracy           0.9604396   0.8637363 1.0000000
## 12                                 Test Accuracy           0.7098446   0.7202073 0.6269430
## 13                                    Test Kappa           0.4987013   0.4986772 0.3309903
## 14                                           AUC           0.8329058   0.8566272 0.6975005
## 15 Average Test Accuracy during Cross Validation           0.6326693   0.5868408 0.5662996
## 16                             Training Accuracy           0.9604396   0.8637363 1.0000000
## 17                                 Test Accuracy           0.7098446   0.7202073 0.6528497
## 18                                    Test Kappa           0.4987013   0.4986772 0.3718547
## 19                                           AUC           0.8328700   0.8566272 0.7359191
## 20 Average Test Accuracy during Cross Validation           0.6329108   0.5868408 0.5677591
##    Random_Forest       SVM
## 1      1.0000000 0.9384615
## 2      0.5699482 0.6632124
## 3      0.1684489 0.4562673
## 4      0.6475618 0.5617548
## 5      0.5473039 0.7135069
## 6      1.0000000 0.9516484
## 7      0.5647668 0.6839378
## 8      0.1467368 0.4754734
## 9      0.6536684 0.5420210
## 10     0.5443487 0.7114723
## 11     1.0000000 0.9538462
## 12     0.5492228 0.7461140
## 13     0.1343060 0.5688625
## 14     0.6415256 0.5198441
## 15     0.5428752 0.6769049
## 16     1.0000000 0.9582418
## 17     0.5492228 0.7305699
## 18     0.1283742 0.5439426
## 19     0.6647513 0.5247258
## 20     0.5472878 0.6995634

Write out the data frame (Model Metrics) to csv file if FLAG_WRITE_METRICS_DF = TRUE

if(FLAG_WRITE_METRICS_DF){
  write.csv(Metrics_results_df,OUTUT_PerformanceMetricsCSV_PATHNAME,row.names = FALSE)
  print("Metrics Performance output path:")
  print(OUTUT_PerformanceMetricsCSV_PATHNAME)
}
## [1] "Metrics Performance output path:"
## [1] "C:\\Users\\wangtia\\Desktop\\AD Risk\\part2\\VersionHistory\\Version7_AutoKnit_Results\\Method1_MultiClass\\Method1_MultiClass_PerformanceMetrics\\INPUT_5000CpGs_250SelFeature_PerMetrics.csv"

Appendix - Variables

Overview of the Data Frame Variables.

  • Phenotype Part Data frame : “phenoticPart_RAW

  • RAW Merged Data frame : “merged_df_raw

  • Processed Data, i.e data used for model train.

    • name for “processed_data” could be :

      • processed_data_m1”, which uses method one to process the data

      • processed_data_m2”, which uses method two to process the data, notice that the feature will be principle component.

      • processed_data_m3”, which uses method three to process the data. This method is Transfer the “DX” to Binary Class. “CN” stays same, and “MCI”,“Dementia” will be transfer to “CI”.

        Comment here is “processed_data_m3_df” is the data frame format of “processed_data_m3” with sample names as row names, and will assigned to “processed_dataFrame”.

      • processed_data_m4”, which uses method four to process the data. This method is filtering the “DX”(drop “MCI” class), limited to CN and Dementia (AD) Classes.

      • processed_data_m5”, which uses method five to process the data. This method is filtering the “DX”(drop “Dementia” class), limited to CN and MCI Classes.

      • processed_data_m6”, which uses method six to process the data. This method is filtering the “DX”(drop “CN” class), limited to MCI and Dementia Classes.

    • name for “AfterProcess_FeatureName” could be :

      • AfterProcess_FeatureName_m1”, which is column name of processed dataframe with method one.
      • AfterProcess_FeatureName_m2”, which is column name of principle component method.
      • AfterProcess_FeatureName_m3”, which is column name of processed dataframe with method three This method is Transfer the “DX” to Binary Class. “CN” stays same, and “MCI”,“Dementia” will be transfer to “CI”.
      • AfterProcess_FeatureName_m4”, which is column name of processed dataframe with method four. This method is filtering the “DX”(drop “MCI” class), limited to CN and Dementia (AD) Classes.
      • AfterProcess_FeatureName_m5”, which is column name of processed dataframe with method five This method is filtering the “DX”(drop “Dementia” class), limited to CN and MCI Classes.
      • AfterProcess_FeatureName_m6”, which is column name of processed dataframe with method six. This method is filtering the “DX”(drop “CN” class), limited to MCI and Dementia Classes.
  • Ordered Feature Importance Based on quantile Data Frame: “combined_importance_quantiles

  • Ordered Feature Importance Based on Mean Data Frame: “combined_importance_Avg_ordered

  • Feature Frequency / Common Data Frame:

    • frequency_feature_df_RAW_ordered” This is selected features’ frequency ordered by Total count of frequency. And the Top number selected in the first step is setted in the input session “INPUT_NUMBER_FEATURES

    • feature_df_full” This is frequency of all features based on our Steps of Frequency Method, and it’s not ordered.

    • all_combined_df_impAvg” This is combined table of frequency and feature importance, it’s not ordered.

  • Output data frame with selected features based on mean method: “df_selected_Mean

    , This data frame not have column named “SampleID”.

    • And the Feature names: “selected_impAvg_ordered_NAME
  • Output data frame with selected features based on median method: “df_selected_Median”, This data frame not have column named “SampleID”.

    • And the Feature names: “Selected_median_imp_Name
  • Output data frame with selected features based on frequency / common feature method: “df_process_Output_freq”, This data frame not have column named “SampleID”.

    • And the Feature names: “df_process_frequency_FeatureName

    • df_feature_Output_frequency” This is selected features’ frequency ordered by Total count of frequency. And the Top number selected in the first step is setted in the input session “NUM_COMMON_FEATURES_SET_Frequency”

    • Selected_Frequency_Feature_importance” This is importance value of selected features’ frequency ordered by Total count of frequency

    • feature_output_df_full” This is frequency of all features based on our Steps of Frequency Method, and it’s not ordered.

    • all_Output_combined_df_impAvg” This is combined table of frequency and feature importance, it’s not ordered.

Overview of the Metrics Variables.

  • Number of CpG used: “Number_N_TopNCpGs

  • Phenotype features selected:

    • Multi: “age.now”,“PTGENDER”, “PC1”,“PC2”,“PC3” (Total number: 5)
    • Binary: “age.now”,“PTGENDER”,“PC1”,“PC2”,“PC3” (Total number: 5)
  • Number of features before processing: (#Phenotype features selected) + (#CpGs Used)

  • Number of features after processing (DMP, data cleaning):“Num_feaForProcess

  • Model performance (Variable names)- Model Training Stage:

    • Model Performance
      Initial Model Training Metric Logistic regression Elastic Net XGBoost Random Forest SVM
      Training Accuracy modelTrain_LRM1_trainAccuracy modelTrain_ENM1_trainAccuracy modelTrain_xgb_trainAccuracy modelTrain_rf_trainAccuracy modelTrain_svm_trainAccuracy
      Test Accuracy cm_modelTrain_LRM1_Accuracy cm_modelTrain_ENM1_Accuracy cm_modelTrain_xgb_Accuracy cm_modelTrain_rf_Accuracy cm_modelTrain_svm_Accuracy
      Test Kappa cm_modelTrain_LRM1_Kappa cm_modelTrain_ENM1_Kappa cm_modelTrain_xgb_Kappa cm_modelTrain_rf_Kappa cm_modelTrain_svm_Kappa
      AUC (for multi class, use mean AUC , and use one vs rest method) modelTrain_LRM1_AUC modelTrain_ENM1_AUC modelTrain_xgb_AUC modelTrain_rf_AUC modelTrain_svm_AUC
      Average Test Accuracy during Cross Validation modelTrain_mean_accuracy_cv_LRM1 modelTrain_mean_accuracy_cv_ENM1 modelTrain_mean_accuracy_cv_xgb modelTrain_mean_accuracy_cv_rf modelTrain_mean_accuracy_cv_svm
  • Number of Key features selected (Mean/Median Methods) : “INPUT_NUMBER_FEATURES

  • Number of Key features remained based on frequency methods : “Num_KeyFea_Frequency

  • Performance of the set of key features (Selected under 3 methods):

    Based on Mean:

    Based on Mean
    Key Features Performance Selected based on Mean Logistic Regression Elastic Net XGBoost Random Forest SVM
    Training Accuracy FeatEval_Mean_LRM1_trainAccuracy FeatEval_Mean_ENM1_trainAccuracy FeatEval_Mean_xgb_trainAccuracy FeatEval_Mean_rf_trainAccuracy FeatEval_Mean_svm_trainAccuracy
    Test Accuracy cm_FeatEval_Mean_LRM1_Accuracy cm_FeatEval_Mean_ENM1_Accuracy cm_FeatEval_Mean_xgb_Accuracy cm_FeatEval_Mean_rf_Accuracy cm_FeatEval_Mean_svm_Accuracy
    Test Kappa cm_FeatEval_Mean_LRM1_Kappa cm_FeatEval_Mean_ENM1_Kappa cm_FeatEval_Mean_xgb_Kappa cm_FeatEval_Mean_rf_Kappa cm_FeatEval_Mean_svm_Kappa
    AUC (for multi class, use mean AUC , and use one vs rest method) FeatEval_Mean_LRM1_AUC FeatEval_Mean_ENM1_AUC FeatEval_Mean_xgb_AUC FeatEval_Mean_rf_AUC FeatEval_Mean_svm_AUC
    Average Test Accuracy during Cross Validation FeatEval_Mean_mean_accuracy_cv_LRM1 FeatEval_Mean_mean_accuracy_cv_ENM1 FeatEval_Mean_mean_accuracy_cv_xgb FeatEval_Mean_mean_accuracy_cv_rf FeatEval_Mean_mean_accuracy_cv_svm

    Based on Median:

    Based on Median
    Key Features Performance Selected based on Mean Logistic Regression Elastic Net XGBoost Random Forest SVM
    Training Accuracy FeatEval_Median_LRM1_trainAccuracy FeatEval_Median_ENM1_trainAccuracy FeatEval_Median_xgb_trainAccuracy FeatEval_Median_rf_trainAccuracy FeatEval_Median_svm_trainAccuracy
    Test Accuracy cm_FeatEval_Median_LRM1_Accuracy cm_FeatEval_Median_ENM1_Accuracy cm_FeatEval_Median_xgb_Accuracy cm_FeatEval_Median_rf_Accuracy cm_FeatEval_Median_svm_Accuracy
    Test Kappa cm_FeatEval_Median_LRM1_Kappa cm_FeatEval_Median_ENM1_Kappa cm_FeatEval_Median_xgb_Kappa cm_FeatEval_Median_rf_Kappa cm_FeatEval_Median_svm_Kappa
    AUC (for multi class, use mean AUC , and use one vs rest method) FeatEval_Median_LRM1_AUC FeatEval_Median_ENM1_AUC FeatEval_Median_xgb_AUC FeatEval_Median_rf_AUC FeatEval_Median_svm_AUC
    Average Test Accuracy during Cross Validation FeatEval_Median_mean_accuracy_cv_LRM1 FeatEval_Median_mean_accuracy_cv_ENM1 FeatEval_Median_mean_accuracy_cv_xgb FeatEval_Median_mean_accuracy_cv_rf FeatEval_Median_mean_accuracy_cv_svm

    Based on Frequency:

    Based on Frequency
    Key Features Performance Selected based on Mean Logistic Regression Elastic Net XGBoost Random Forest SVM
    Training Accuracy FeatEval_Freq_LRM1_trainAccuracy FeatEval_Freq_ENM1_trainAccuracy FeatEval_Freq_xgb_trainAccuracy FeatEval_Freq_rf_trainAccuracy FeatEval_Freq_svm_trainAccuracy
    Test Accuracy cm_FeatEval_Freq_LRM1_Accuracy cm_FeatEval_Freq_ENM1_Accuracy cm_FeatEval_Freq_xgb_Accuracy cm_FeatEval_Freq_rf_Accuracy cm_FeatEval_Freq_svm_Accuracy
    Test Kappa cm_FeatEval_Freq_LRM1_Kappa cm_FeatEval_Freq_ENM1_Kappa cm_FeatEval_Freq_xgb_Kappa cm_FeatEval_Freq_rf_Kappa cm_FeatEval_Freq_svm_Kappa
    AUC (for multi class, use mean AUC , and use one vs rest method) FeatEval_Freq_LRM1_AUC FeatEval_Freq_ENM1_AUC FeatEval_Freq_xgb_AUC FeatEval_Freq_rf_AUC FeatEval_Freq_svm_AUC
    Average Test Accuracy during Cross Validation FeatEval_Freq_mean_accuracy_cv_LRM1 FeatEval_Freq_mean_accuracy_cv_ENM1 FeatEval_Freq_mean_accuracy_cv_xgb FeatEval_Freq_mean_accuracy_cv_rf FeatEval_Freq_mean_accuracy_cv_svm